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Preface 


The  n-bit  simulation  tool  developed  by  Captain  Gary  A.  Klein  has 
been  modified  to  include  the  capability  of  simulating  two' s- complement 
machines  which  truncate  results,  as  opposed  to  rounding  them.  Anyone 
who  uses  this  tool  should  realize  the  effect  different  number  represen- 
tation schemes  can  have  on  arithmetic  computations. 

It  is  easy  to  state  that  we  need  a computer  with  a longer  word- 
length  whenever  we  fail  to  achieve  a required  accuracy  from  a piece  of 
software.  Sometimes,  however,  this  required  accuracy  can  be  obtained 
by  employing  better  programming  techniques  on  the  original  machine.  In 
writing  this  thesis,  I am  primarily  interested  in  applying  numerical 
analysis  techniques  to  software  to  obtain  a required  accuracy  with  the 
constraints  of  a given  fixed  wordlength  and  number  representation  and 
handling  schemes. 

I would  especially  like  to  thank  my  thesis  advisor.  Dr.  Gary  B. 
Lamont,  for  his  guidance,  support,  and  many  suggestions  when  they  were 
needed  most.  I would  also  like  to  thank  my  two  committee  members.  Dr. 
Thaddeus  L.  Regulinski  and  Dr.  Peter  S.  Maybeck,  for  their  advice  and 
encouragement.  A special  thanks  goes  to  my  thesis  sponsor.  Dr.  Donald 
L.  Moon,  for  the  many  hours  he  spent  helping  me  get  started  and  for 
providing  some  much  needed  guidance  throughout  this  period.  Lastly,  I 
would  like  to  thank  my  typist,  Cheryl  Gilliland,  for  her  diligent  work. 
I thank  all  of  you  for  your  continued  patience,  support,  and  under- 
standing that  has  made  it  possible  for  me  to  complete  this  thesis. 
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Abstract 


Errors  due  to  finite  wordlangth  are  unavoidable  when  aircraft 

signal  processing  operations  such  as  flight  control,  navigation,  and 

) 

fire  control  are  implemented  on  a digital  computer.  To  reduce  these 
errors  to  tolerable  levels,  longer  wordlengths  can  sometimes  be  employed. 
The  effects  of  some  of  the  errors,  such  as  those  due  to  arithmetic  series 
truncation,  machine  roundoff,  and  quantization  of  system  coefficients, 
can  be  lessened  somewhat  by  appropriate  numerical  analysis  techniques. 

An  n-bit  simulator  which  runs  on  Control  Data  Corporation  (CDC) 
6600/CYBER  74  computer  systems  was  modified  and  then  used  to  evaluate 
the  accuracy  of  a flight  navigation  routine  coded  in  FORTRAN.  The  rou- 
tines were  executed  without  the  simulator  to  obtain  results  used  for 
benchmarking.  The  n-bit  simulator  was  employed  to  simulate  the  numeri- 
cal characteristics  of  the  AN/AYK-15A  digital  processor.  Error  plots 
were  constructed  which  show  the  maximum  errors  occurring  within  small 
plotting  intervals  plotted  against  each  individual  input  value.  These 
plots  were  used  to  aid  visually  in  analyzing  the  error  characteristics 
of  the  avionics  routine  as  it  would  be  implemented  on  the  AN/AYK-15A. 

A critical  analysis  of  the  error  plots  obtained  showed  that  routines 
which  are  coded  using  single-precision  floating-point  arithmetic  are 
prone  to  errors  which  exceed  the  error  bounds  specified  for  the  routines. 
This  occurs  even  though  range  reductions  in  the  trigonometric  function 
approximations  are  accomplished  using  extended  precision. 
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I Introduction 


When  aircraft  signal  processing  operations  such  as  flight  control, 
navigation,  and  fire  control,  are  implemented  on  a digital  computer, 
errors  result  which  are  due  to  finite  computer  wordlengths.  Longer 
wordlengths  must  be  employed  sometimes  to  reduce  these  errors  to  toler- 
able levels.  Appropriate  numerical  analysis  and  design  techniques  can 
reduce  the  effects  of  some  of  the  errors,  such  as  those  due  to  arith- 
metic series  truncation,  machine  roundoff,  and  quantization  of  system 
coefficients. 

Two  basic  approaches  might  be  considered  when  trying  to  analyze 
the  numerical  error  characteristics  of  a programmed  routine.  First, 
consider  the  routine  as  given  and  determine  the  wordlengt'h  needed  to 
give  a desired  accuracy.  Second,  considering  specific  hardware  charac- 
teristics and  the  associated  routine  algorithm,  attempt  to  implement  a 
modified  algorithm  such  that  the  resulting  errors  are  minimized.  In 
this  investigation,  the  second  approach  was  selected  as  the  primary 
mode  of  analysis. 

The  Air  Force  Avionics  Laboratory  has  written  a development  speci- 
fication for  the  AN/AYK-15A  computer  processor  (Ref  2).  This  processor 
will  be  used  in  the  Digital  Avionics  Information  System  (DAIS)  Inte- 
grated Test  Bed  and  is  a candidate  for  a follow-on  flight  test  program 
in  the  F-16  aircraft.  This  specification  establishes  the  performance, 
design,  development,  and  test  requirements  for  the  Processor  prime  item. 
The  instruction  set  specified  is  the  MIL-STD-1750  airborne  computer 
instruction  set  (Ref  49).  MIL-STD-1750  defines  the  instruction  set, 
mnemonics,  and  data  format  requirements  for  airborne  computers. 


Purpose 


The  purpose  of  this  investigation  is  to  develop  tools  and  techniques 
which  can  be  used  to  determine  if  a given  computer  can  solve  a given 
avionics  signal  processing  problem  within  certain  specified  error  and 
time  tolerances.  Specifically,  the  following  goals  were  defined: 

- develop  a tool  to  simulate  accurately  the  computational  charac- 
teristics of  any  digital  processor  being  studied 

- produce  common  library  routines  which  are  optimal  in  the  sense 
that  they  try  to  maximize  both  absolute  and  relative  accuracy 
and  at  the  same  time  minimize  the  number  of  instructions 
(especially  multiplications)  required 

demonstrate  the  effectiveness  of  the  simulation  tool  mentioned 
above  to  analyze  the  error  characteristics  of  a given  class  of 
algorithms  and  associated  routines. 

Approach 

There  are  four  basic  requirements  which  must  be.  satisfied  in  order 
to  meet  these  goals.  The  first  requirement  is  that  the  n-bit  simulator 
must  accurately  simulate  the  numerical  properties  of  processors  which 
use  the  airborne  computer  instruction  set  defined  in  the  Mll.-STD-l 750 
document.  The  second  requirement  is  that  library  routines  for  sine, 
cosine,  arctangent,  and  square  root  be  developed  which  produce  results 
as  accurately  as  possible.  These  routines  must  reflect  the  error  char- 
acteristics they  would  have  if  they  were  implemented  on  the  simulated 
computer.  The  third  requirement  is  to  demonstrate  how  the  n-hlt  simu- 
lator can  be  used  to  perform  an  error  analysis.  This  requirement  was 
constrained  to  forward  error  analysis.  The  fourth  requirement  is  to 
indicate  how  the  techniques  applied  to  one  program  representing  an 
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algorithm  can  ho  app  1 i cd  to  n larger  class  of  programs.  If  tln>  techni- 
ques demonstrated  can  ho  applied  to  a larger  class  of  programs,  then 
programs  from  this  larger  class  can  he  analyzed  to  determine  the  corn- 
pat  ihl  lity  of  the  programs  with  tin'  computer  being  simulated. 

Several  procedures  or  approaches  are  available  for  assessing  the 
quality  of  floating-point  mathematical  software.  Typical  analysis 
schemes  can  be  classified  as  follows: 

1)  error-bounding  schemes  (Kefs  9;  31;  50;  and  '>3), 

2)  forward  error  analysis  (Kefs  10;  25;  29;  52;  and  62), 

3)  backward  error  analysis  (Kefs  48;  59;  62;  65;  and  6t>), 

4)  multiple-precision  arithmetic  (Kefs  5;  17;  35;  and  68) , 

5)  perturbation  analysis  (Kef  63),  and 

6)  significance  arithmetic  (Kefs  3 and  6). 

F.ach  of  these  six  approaches  was  considered,  with  forward  error  analysts 
being  chosen.  Forward  error  analysis  requires  that  computed  results  bo 
compared  to  higher  precision  reference  values.  This  requirement  was  met 
by  using  the  Control  Data  Corporation  (CDC)  CYBKR  74  computer  with  its 
60-blt  word  length  to  compute  the  higher- precision  results.  These  results 
are  then  available  to  be  compared  to  results  which  would  be  obtained 
using  the  computer  on  which  the  routines  would  normally  be  executed. 

Thus,  the  necessary  tools  and  techniques  for  conducting  a forward  error 
analysis  can  be  developed  for  use  with  the  CDC  CYUF.R  74  computer.  Since 
forward  error  analyses  reveal  quantization,  roundoff,  and  truncation 
errors  as  they  actually  occur  (as  opposed  to  Just  giving  uppe  limits), 
this  method  was  selected  over  the  other  error  analysis  schemes. 

There  are  three  primary  sources  of  errors  In  any  numerical  result: 
transmitted  errors,  analytic  truncation  errors,  and  generated  errors 
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(Ref  11:125).  Transmitted  errors  are  those  which  occur  in  the  original 
input  data.  Analytic  errors  occur  when  finite  processes  are  substituted 
for  essentially  infinite  processes  in  the  mathematical  algorithm.  The 
third  source  of  errors,  generated  errors,  represent  the  errors  actually 
generated  within  the  computer  program  if  exact  input  data  is  entered. 
Generated  errors  reflect  the  method  of  rounding  or  truncation  utilized 
and  other  design  characteristics  such  as  the  number  system  base,  or 
machine  radix.  For  the  purposes  of  this  investigation,  when  a program 
is  being  evaluated  solely  for  its  accuracy,  the  transmitted  errors  are 
not  considered,  since  they  would  normally  occur  during  the  conversion 
of  analog  signals  to  digital  signals.  A simplified  diagram  depicting 
the  method  utilized  by  the  n-bit  simulator  to  perform  a forward  error 
analysis  is  shown  in  Fig.  1.  For  simulations  where  the  transmitted 
errors  are  included,  the  n-bit  simulator  will  be  employed  as  indicated 
in  Fig.  2.  In  both  diagrams,  the  area  within  the  dotted  lines  repre- 
sents the  phase  of  analysis  using  n-bit  accuracy.  A general  discussion 
of  the  philosophical  questions  concerning  the  isolation  of  transmitted 
errors  is  presented  by  Kuki  and  Ascoly  (Refs  41  and  42). 

Tools  and  Techniques 

Several  software  tools  exist  to  aid  the  analyst  in  performing  a 
forward  error  analysis.  Two  of  these  are  an  n-bit  wordlength  simulator 
(Ref  37)  and  a floating-point  simulator  (Ref  25).  The  n-bit  simulator 
was  selected  as  the  primary  tool  for  several  reasons.  First,  with  a few 
modifications  it  should  be  able  to  simulate  the  numerical  properties  of 
any  processor  following  the  MIL-STD-1750  specification;  second,  it  is 
well  documented;  and  third,  it  has  been  tested  on  a CDC  CYBER  74  com- 
puter system,  thereby  eliminating  the  need  to  use  multiple-precision 


packages  in  order  to  obtain  benchmark  data.  This  also  eliminated  the 
need  to  use  two  computers,  since  the  higher-precision  results  and  the 
results  of  the  simulated  computer  can  be  made  available  at  the  same  time 
for  immediate  comparison  using  the  accuracy  of  the  CI)C  CYBER  74  computer. 
The  n-bit  simulator  gives  the  user  the  option  of  selecting  different 
wordlengths  within  any  program,  thereby  making  it  a valuable  tool  for 
perturbation  analysts,  since  the  processor  wordlength  is  also  considered 
to  be  a candidate  for  perturbations.  It  is  assumed  that  the  reader  is 
familiar  with  the  work  by  Klein  (Kef  37). 

Assumptions /Constraints 

Several  assumptions  were  made  during  the  course  of  conducting  this 
investigation.  Five  of  those  assumptions  are  the  same  as  the  first  five 
assumptions  by  Klein  (Ref  37:6-8).  These  five  assumptions  cover  various 
aspects  of  the  n-bit  simulator  such  as  its  capabilities  and  the  environ- 
ment in  which  it  is  executed.  Klein* s sixth  assumption  has  been  changed 
due  to  the  modifications  made  to  the  n-bit  simulator.  The  n-bit  simu- 
lator is  now  able  to  simulate  computers  vising  one's  complement,  two's 
complement,  or  sign-magnitude  arithmetic.  Also,  binary,  quaternary, 
octal,  or  hexadecimal  numerical  data  representations  can  now  bo  simu- 
lated. There  is  still  no  provision  for  simulating  decimal  representa- 
tions. The  first  part  of  Klein's  sixth  assumption,  that  floating-point 
mantissas  will  be  normalized,  remains  the  same.  Five  other  assumptions 
that  were  made  are: 

the  CDC  CYBER  74  mantissa  length  of  48  bits  provides  sufficient 
accuracy  (error  less  than  one  tenth  of  one  percent  of  that  pro- 
duced by  the  simulated  computer)  for  benchmarking  purposes  in  a 


forward  error  analysis. 


fixed-point  numbers  would  bo  used  primarily  for  addressing,  sub- 
scripting, and  logical  values,  while  floating-point  numbers  would 
be  used  in  performing  the  numerical  computations  (Ref  2), 
execution  time  is  a "precious  resource"  on  avionics  computers, 
meaning  that  flight  routines  must  execute  as  efficiently  as 
possible  while  still  maintaining  the  required  accuracy  (Ref  57), 
coding  an  algorithm  in  FORTRAN  for  the  purpose  of  executing  it 
on  the  n-bit  simulator  will  not  significantly  alter  its  error 
characteristics  from  those  it  would  have  if  implemented  on  an 
avionics  computer,  since  the  n-bit  simulator  modifies  all  inter- 
mediate results  as  well  as  the  final  results,  and 
avionics  algorithms  found  in  Ref  57  are  representative  of  general 
avionics  algorithms  which  might  be  programmed  for  use  in  differ- 
ent aircraft  and  are  therefore  sufficient  to  be  employed  in  dev- 
eloping a method  for  evaluating  a general  class  of  algorithms. 
Since,  these  algorithms  ate  for  the  F-16  aircraft,  they  can  be 
used  to  test  the  error  characteristics  of  the  AN/AYK-15A  avionics 
processor  which  is  a candidate  for  use  in  F-16  aircraft. 

Chapter  Synopsis 

The  second  chapter  of  this  investigation  contains  a summary  of  the 
modifications  made  to  the  n-bit  simulator  to  allow  it  to  simulate  accur- 
ately processors  which  use  the  airborne  computer  instruction  set  defined 
by  MIL-STI>-1750,  and  in  particular,  the  AN/AYK-15A  processor.  The  third 
chapter  contains  a discussion  of  Monte  Carlo  techniques  which  were  ap- 
plied when  conducting  a forward  error  analysis.  Topics  discussed  include 
generating  and  testing  pseudo-random  numbers  and  obtaining  criteria  to 
use  to  decide  when  to  stop  testing.  The  fourth  chapter  contains  a 
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discussion  of  the  procedures  used  to  develop  sine,  cosine,  and  square 
root  function  approximations.  There  was  no  approximation  developed  for 
the  arctangent  function.  Many  different  approximations  and  methods  were 
tested,  with  only  the  "best"  being  discussed.  The  fifth  chapter  con- 
tains a detailed  discussion  of  one  flight  routine  (representing  a por- 
tion of  an  algorithm)  and  the  associated  error  analysis  conducted.  This 
chapter  presents  a method  for  using  the  n-bit  simulator  in  forward  error 
analysis  studies  on  this  one  flight  routine.  The  techniques  discussed 
are  shown  to  be  applicable  for  testing  other  routines  as  well.  The 
sixth  chapter  contains  a discussion  of  the  quasilinearization  method  as 
it  might  be  applied  in  a forward  error  analysis.  The  last  chapter  con- 
tains the  conclusions  drawn  from  the  error  analyses  and  associated  re- 
sults and  discusses  recommendations  for  future  research. 
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II  N-bit  Simulator 


The  n-bit  simulator  (Kef  37)  consists  of  two  major  components.  The 
first,  the  preprocessor,  reads  as  input  a FORTRAN  source  program  repre- 
senting an  algorithm,  translates  it  into  another  FORTRAN  source  program 
with  all  the  arithmetic  operations  replaced  hv  subroutine  calls,  and 
writes  it  to  an  output  file.  The  second  part  is  a collection  of  sub- 
routines the  modified  FORTRAN  source  program  calls  when  executing. 

These  subroutines  perform  t he  arithmetic  operation  they  replaced  vising 
tiie  wordlength  of  the  host  computer  and  then  they  modify  t lie  result  to 
reflect  the  properties  of  the  simulated  computer  processor  wordlength. 
Some  of  the  important  characteristics  of  the  n-bit  simulator  are  its 
capabilities  to  handle  both  fixed-point  (limited)  and  floating-point 
arlthmet ic,  its  capability  to  perform  either  rounding  or  truncation, 
and  its  documentation  and  overall  design  which  allowed  it  to  be  easily 
modified.  In  this  chapter  are  discussed  the  original  n-bit  simulator, 
its  capabilities,  and  a series  of  modifications  made  to  the  preprocessor 
and  selected  subroutines.  Options  were  added  to  allow  the  user  to 
specify  the  machine  radix,  the  number  of  guard  bits  used,  and  the  type 
of  arithmetic  performed  (one's  complement  or  two's  complement).  The 
user  also  now  has  the  capability  of  specifying  the  fixed-point  word- 
length  separately  from  the  float ing-point  wordlength.  A detailed  dis- 
cussion of  the  original  n-bit  simulator  is  presented  by  Klein  (Ref  37). 
An  vipdated  user's  manual  is  shown  in  Appendix  A. 

Preprocessor 

The  preprocessor  changes  all  arithmetic  assignment  statements  into 
a series  of  one  or  more  subroutine  calls.  F.ach  arithmetic  operator 


(+,  -,  *,  /,  **,  and  ■)  is  replaced  by  one  subroutine  call,  thereby 
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allowing  both  intermediate  and  final  results  to  reflect  accurately  the 
characteristics  of  the  simulated  computer.  The  only  change  made  to  the 


preprocessor  was  the  addition  of  one  more  parameter  In  the  subroutine 
calls.  This  extra  parameter,  which  Is  the  name  of  an  array  which  holds 
the  overflow  and  underflow  limits  for  floating  point  values,  was  added 
to  allow  faster  checking  for  floating-point  overflow  and  underflow.  The 
values  in  the  original  simulator  were  stored  in  an  integer  array  and  the 
SHIFT  function  was  used  to  prevent  conversion  of  the  values  to  fixed- 
point  format. 

SKTNBTT  Sub  rout ine 

SKTNBIT  is  the  name  of  a subroutine  which  is  called  as  the  first 
executable  statement  of  the  FORTRAN  program  and  wherever  else  required. 
This  subroutine  takes  the  input  parameters  which  are  the  user  options 
and  converts  them  into  values  the  numerical  rout ines  need.  The  ten  user 
options  associated  with  any  call  to  the  SKTNBIT  routine  are  shown  in 
Fig.  3. 


User  specifications: 

1) 

NBITS 

Floating  point  wordlength 

2) 

MANTSA 

Mantissa  length  (excluding  sign) 

3) 

IGUARO 

Number  of  guard  digits 

4) 

IEXPNT 

Exponent  length  (including  sign) 

5) 

IPTPOS 

Binary  point  position 

6) 

IRNDTR 

Round  or  truncate 

7) 

IONTWO 

One's  complement  or  two's  complement  arithmetic 

8) 

ITYPF. 

Machine  radix 

9) 

MESSRS 

Message  suppression 

10) 

IFIXD 

Fixed  point  wordlength 

Fig.  3.  User  Specifications  (Options) 


Each  entry  shows  the  order  the  option  occurs  in  the  parameter  list,  the 
variable  name,  and  its  meaning.  The  variable  names  will  be  used  in 
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equations  later  in  this  chapter.  The  output  values  are  stored  in  two 
arrays,  the  first  (KEY)  holding  fixed-point  values  and  the  second  (THEY) 


holding  floating-point  values.  These  arrays  are  the  last  two  parameters 
for  SETNBIT.  The  ten  user  options  will  he  explained  first  followed  by 
the  way  the  two  arrays  are  filled. 

Opt  ion  One.  The  first  option  is  used  to  indicate  the  total  number 
of  bits  in  a floating-point  word.  As  an  example  of  this,  the  AN/AYK-1SA 
digital  processor  has  a machine  word length  of  lb  bits,  with  single  pre- 
cision fixed-point  words  being  represented  by  lb  bits,  or  one  machine 
word,  and  single-precision  floating-point  words  being  represented  by  32 
bits,  or  two  machine  words.  When  simulating  with  single  precision 
floating-point  words,  the  value  of  option  one  would  be  set  to  32,  or 
the  number  of  bits  in  the  floating-point  word.  Since  some  smaller 
computers  use  different  numbers  of  machine  words  to  represent  fixed- 
point  and  floating-point  words  (Kef  49) , just  as  the  AN/AYK-1SA  does, 
option  ten  was  added  to  allow  the  user  to  specify  a fixed-point  word- 
length  different  from  the  floating-point  wordlength  specified  in  option 
one. 

Option  Two.  The  second  option  allows  the  user  to  specify  the 
number  of  bits  in  the  mantissa  of  the  floating-point  word,  not  counting 
the  sign  of  the  mantissa.  This  is  the  mantissa  length  for  all  final 
variable  assignments. 

Option  Three.  The  third  option,  which  replaces  the  one  in  the 
original  simulator  which  was  used  to  indicate  single,  double,  or  triple 
precision  computation,  is  used  to  indicate  the  number  of  guard  bits 
employed  by  the  simulated  computer.  The  uses  of  guard  digits  as  defined 
by  Kukl  and  Cody  (Kef  43:224)  are  as  follows.  For  addition,  guard 


digits  are  primarily  used  to  retain  shifted  digits  of  the  operand  with 
the  smaller  exponent  at  the  time  of  the  exponent  alignment  of  the 
operands.  Guard  digits  also  participate  in  the  right  shift  of  the  inter- 
mediate sum  in  ease  of  a carry.  Using  only  one  guard  digit,  the  relative 
accuracy  of  the  sum  of  two  exact ly  represented  numbers  may  be  protected 
to  within  machine  precision.  The  representation  of  the  product  of  two 
N-dlglt  signlficands  requires  2N  digits.  If,  of  these  2N  digits,  NfK 
high  order  digits  are  actually  developed  before  the  post  normalization 
of  the  result,  then  K guard  digits  have  been  used  for  multiplication. 
Since  the  multiplication  of  two  normalized  operands  requires  post  normal 
ization  of  at  most  one  digit,  thete  will  be  either  K or  K-l  guard  digits 
available  for  possible  rounding  after  the  postnormalization.  If  no 
guard  digits  are  used , a number  could  be  changed  by  multiplying  it  by 
1.0  (Kef  43:224).  It  is  still  possible  to  indicate  single,  double,  or 
triple  precision  computations  by  setting  this  option  to  the  required 
number  of  extra  bits.  As  an  example  of  how  this  option  is  used,  if 
MANTSA  is  assigned  the  value  23  and  1GDAK0  the  value  2,  the  mult {pli- 
ca t i on 

TEMP  ■>  T.1  * T2  (l) 

would  be  performed  using  25>-bit  mantissas  and  the  result  would  be  stored 
in  the  variable  location  TEMP  using  a 23-bit  mantissa.  For  more  infor- 
mation on  the  effects  of  guard  digits  see  Kuek,  Parter,  and  Sameh  (Kef 
39),  Kukl  and  Cody  (Kef  43),  and  Cody  (Kef  12). 

Option  Four.  Option  four  Is  the  same  as  one  of  the  options  in  the 
original  simulator  and  is  used  to  show  the  exponent  length  for  floating- 
point words.  The  sign  of  the  exponent  is  included  in  the  length  of  the 
exponent,  so  the  condition 
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NBITS  *=  MANTSA  + IEXPNT  + 1 


(2) 


must  hold  true. 

Option  Five.  The  fifth  option  is  the  same  as  one  in  the  original 
simulator  and  allows  the  user  to  specify  whether  the  binary  point  is  to 
the  left  of  the  mantissa  (making  the  mantissa  a fraction)  or  to  the 
right  of  the  mantissa  (making  the  mantissa  an  integer). 

Option  Six.  This  option  is  also  the  same  as  one  in  the  original 
simulator  and  is  used  to  specify  whether  rounding  or  truncation  will  be 
performed. 

Option  Seven.  This  option  was  added  to  allow  the  user  to  specify 
whether  the  machine  being  simulated  words  in  sign  plus  magnitude,  sign 
plus  one's  complement,  or  sign  plus  two's  complement  floating-point 
arithmetic.  When  rounding  is  being  performed,  the  only  differences  are 
in  the  overflow  and  underflow  values.  When  truncation  is  being  per- 
formed, however,  one  other  rather  subtle  difference  appears,  that  being 
that  sign  plus  two’s  complement  negative  numbers  are  truncated  away 
from  zero  while  sign  plus  magnitude  and  sign  plus  one's  complement 
negative  numbers  are  truncated  toward  zero.  This  concept,  shown  gra- 
phically in  Fig.  4 for  fixed-point  numbers,  is  explained  in  detail  for 
both  fixed-point  and  floating-point  numbers  by  Oppenheim  and  Weinstein 
(Ref  55:406-413).  In  Fig.  4,  x represents  the  number  before  truncation 
or  rounding,  Q(x)  represents  the  number  after  truncation  or  rounding, 
and  the  variable  b represents  the  wordlength  for  a fixed-point  number. 
Rounding  and  truncation  are  performed  by  the  special  subroutines 
ONETRNC,  TWOTRNC,  and  ROUNDER  which  are  explained  in  detail  later  in 
this  chapter.  For  more  information  on  the  effects  of  rounding  and 
truncation,  see  Kuck,  Parker,  and  Sameh  (Ref  39),  Tsao  (Ref  61),  Kuki 


and  Cody  (Kef  43),  Cody  (Kef  12),  and  Kaneka  and  Liu  (Kef  30). 

Option  Eight.  Option  eight  was  added  to  allow  the  user  to  specify 
the  machine  base,  or  radix,  of  the  computer  being  simulated,  provided 
the  radix  is  a multiple  of  two.  Binary,  octal,  and  hexadecimal  machines 
arc  examples  of  computers  having  a machine  base  which  is  a multiple  of 
two.  The  machine  base  defines  the  actual  value  of  the  exponent  and 
affects  not  only  the  overflow  and  underflow  limits,  but  also  the  way  the 
mantissa  Is  normalized.  In  the  following  examples,  the  mantissa  sign 
bit  is  first,  the  mantissa  is  second,  and  the  exponent  is  last,  since 
this  is  the  way  computers  following  the  MiL-STD-1750  document  represent 
floating-point  numbers.  Truncation  is  assumed,  and  the  four  bit  pat- 
terns in  Fig.  5 represent  how  the  result  of  dividing  127  by  64,  or  the 
number  1.984375,  would  be  represented  on  binary,  quaternary,  octal,  and 
hexadecimal  machines  respectively. 


Bit  pattern 

Actual  value 

Kadix 

0 1111111  0001 

1.984375 

2 

0 0111111  0001 

1.96875 

4 

0 0011111  0001 

1.9375 

8 

0 0001111  0001 

1.875 

16 

Fig.  5.  Kadix  Effects 

In  each  case,  the  wordlength  is  12  bits,  with  1 bit  for  the  sign,  7 
bits  for  the  mantissa,  and  4 bits  for  the  exponent.  More  information 
on  the  effects  that  the  machine  radix  lias  on  numerical  accuracy  can  he 
found  in  Kuki  and  Cody  (Kef  45),  Kuck,  Taiker,  and  Sameh  (Kef  39), 

Goldberg  (Ref  27),  and  Cody  (Kef  12). 

Option  Nine.  The  ninth  option  is  the  same  as  one  in  the  original 
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simulator  and  Is  used  to  control  the  printing  or  suppression  of  overflow 
and  underflow  messages.  If,  during  any  part  of  the  simulation,  overflow 
occurs  on  the  CIH)  CYBER  74  and  the  value  is  used  again,  the  program  will 
terminate  regardless  of  the  value  of  this  flag. 

Oli 1 1 on  Ten.  This  last  option  was  added  to  allow  the  user  to  specify 
the  fixed-point  wordlength  so  that  correct  overflow  bounds  can  he  estab- 
lished for  both  fixed-point  and  floating-point  computations.  This  allows 
programs  run  on  computers  such  ns  the  AN/AYK-1SA,  where  the  fixed-point 
wordlength  is  different  from  the  floating-point  wordlength,  to  be 
simulated  accurately  without  having  to  call  the  SETNB1T  subroutine  when 
changing  between  fixed-point  and  floating-point  computations. 

KEY.  The  array  KEY  holds  fixed-point  values  which  are  used  by  the 
numerical  subroutines.  The  following  paragraphs  describe  how  the 
elements  of  the  KEY  array  are  filled  from  the  values  specified  as  user 
opt  ions . 

KEY ( 1) . KEY(l)  holds  the  largest  fixed-point  positive  number 
which  can  be  represented  on  the  machine  being  simulated.  Given  the 
IF1X0  input  in  option  ten,  KEY(l)  may  be  computed  by 

KEY ( 1 ) - (2**( IFIXO- l) )-l  (3) 

KEY ( 2 ) . KEY(2)  holds  the  largest  (magnitude)  negative  value 
of  the  simulated  computer  and  is  used  in  checking  for  negative  fixed- 
point  overflow.  For  sign  plus  magnitude  and  sign  plus  one's  complement 
machines,  KEY (2)  may  be  computed  by 

KEY (2)  - -KEY ( 1 ) (4) 

but  for  sign  plus  two's  complement  machines  it  should  be  computed  by 

KEY (2)  - -(2**(1FI\T>-1))  (3) 

KEY ( 3) . KEY(3)  holds  user  option  six,  or  IRNOTR,  specifying 
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rounding  or  truncation.  The  use  of  this  field  is  discussed  further 
later  in  this  chapter  in  the  section  entitled  Special  Subroutines. 

KEY (A) . KEY (A)  holds  user  option  seven,  or  10NTW0,  speci- 
fying one's  complement  or  two's  complement  arithmetic.  The  use  of  this 
field  is  also  discussed  later  in  this  chapter  in  the  section  entitled 
Special  Subroutines. 

KEY (5) . KEY (5)  holds  user  option  nine,  or  MESSGS,  specifying 
print  or  print  suppression. 

KEY (6) . KEY(6)  holds  t he  number  of  mantissa  bits  to  save  on 
all  final  assignments  and  is  filled  with  the  value  of  MANTSA,  or  user 
option  two. 

KEY ( 7 ) . KEY(7)  holds  the  number  of  extra  mantissa  bits  to 
save  during  intermediate  operations  and  is  filled  with  the  value  of 
I GUARD,  or  user  option  throe. 

KEY (8) . KEY(8)  holds  user  option  eight,  or  ITYPE,  specifying 
the  radix  of  the  machine  being  simulated.  The  value  actually  stored  is 
the  exponent  of  two  which  would  give  the  desired  radix  (the  value  3 
represents  octal,  since  2**3’=8). 

TKEY . The  array  named  TKKY  holds  four  values  used  for  overflow  and 
underflow  checking.  While  each  element  of  the  array  can  be  represented 
by  a mathematical  expression,  the  actual  values  arc  computed  using 
shift,  mask,  and  addition  or  subtraction  operations.  In  some  instances, 
an  operand  being  added  or  subtracted  is  unnorroalized,  with  the  answer 
being  normalized  by  the  CDC  CYBER  7A  computer.  If  the  physical  limits 
of  the  CDC  computer  would  be  exceeded  In  trying  to  represent  the  over- 
flow or  underflow  values  of  the  computer  being  simulated,  the  CDC  limits 
arc  used  instead.  The  documented  code  to  implement  the  algorithms  to 
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fill  the  TKEY  array  is  shown  in  Appendix  B.  Those  operands  used  in 
additions  or  subtractions  which  are  unnormalized  are  annotated  as  such. 
In  the  equations  which  follow,  the  variables  are  the  same  as  those 
shown  in  Fig.  3. 

TKEY(l) . The  first  element  of  array  TKEY  holds  the  largest 
positive  floating-point  value  that  can  be  expressed  on  the  computer 
being  simulated.  If,  during  any  operation,  a floating-point  result 
exceeds  this  value,  positive  overflow  will  be  signaled  and  the  result 
will  be  replaced  by  TKEY(l).  The  value  stored  in  TKEY(l)  depends  upon 
the  location  of  the  binary  point  specified  by  IPTPOS,  or  user  option  5. 
If  the  binary  point  is  on  the  left,  TKEY(l)  may  be  computed  using 

TKEY ( 1)  = (2**MANTSA-1) *(2**(ITYPE*( (2**(IEXPNT-1) )-l)- 

MANTSA))  (6) 

If  the  binary  point  is  on  the  right,  TKEY(l)  may  be  computed  using 

TKEY(l)  = (2**MANTSA-1) *(2**(ITYPE*( (2**(IEXPNT-1) )-l) ) ) (7) 

TKEY (2) . The  second  element  of  TKEY  holds  the  largest 
(magnitude)  negative  floating-point  value  representable  by  the  computer 
being  simulated.  If  any  result  is  less  than  TKEY(2),  negative  overflow 
is  signaled  and  the  result  is  replaced  by  TKEY(2).  The  value  stored  in 
TKEY(2)  depends  upon  both  the  location  of  the  binary  point  specified  by 
IPTPOS  and  the  type  of  arithmetic  specified  by  IONTWO,  or  user  option  7. 
If  the  binary  point  is  on  the  left  and  sign  plus  one's  complement 
arithmetic  is  specified,  TKEY(2)  may  be  computed  using 

TKEY (2)  = -(2**MANTSA-1)*(2**(ITYPE*((2**(IEXPNT-1))-1)- 

MANTSA))  (8) 

If  the  binary  point  is  on  the  left  and  sign  plus  two's  complement  arith- 

metic is  specified,  TKEY(2)  should  be  computed  using 
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TKEY (2)  » - (2**( ITYPE*( (2**(IEXPNT-1) ) -1) ) ) (9) 

For  the  cases  where  the  binary  point  has  been  specified  to  be  on  the 
right, 

TKEY (2)  = - (2**MANTSA-1) *(2**(ITYPE*( (2**(IEXPNT-1) )-l) ) ) (10) 

is  used  for  the  sign  plus  one's  complement  case  and 

TKEY (2)  - -(2**(ITYPK*( (2**(1EXPNT-1) )-l)+MANTSA) ) (11) 

is  used  for  the  sign  plus  two's  complement  case. 

TKEY ( 3) . The  third  element  of  TKEY  holds  the  smallest  posi- 
tive floating-point  number  which  is  greater  than  zero  and  still  repre- 
sentable on  the  computer  being  simulated.  If  any  result  is  greater  than 
zero  and  less  than  TKEY(3),  positive  underflow  is  signaled  and  the  result 
is  replaced  by  zero.  For  the  cases  where  the  binary  point  is  specified 
to  be  on  the  left, 

TKEY (3)  ■=  2**(-(ITYPE*( (2**(IEXPNT-1) )-l)+lTYPE) ) (12) 

is  used  when  sign  plus  one's  complement  arithmetic  is  specified  and 

TKEY ( 3)  - 2**(-(ITYPE*(2**(IEXPNT-l) )+ITYPE) ) (13) 

is  used  when  sign  plus  two's  complement  arithmetic  is  specified.  If 
the  binary  point  is  specified  to  be  on  the  right,  the  value  for  TKEY(3) 

Is  computed  using 

TKEY ( 3)  » 2 ** (- ( ITYPE* ( (2 ** ( I EXPNT- 1) ) - 1 )+l TYPE- HANTS A) ) (14) 

for  the  sign  plus  one's  complement  case  and  using  » 

TKEY ( 3)  - 2 ** ( - ( ITYPE  * ( 2 ** ( IEXPNT- 1 ) ) +ITYP E- HANTS A) ) (15) 

for  the  sign  plus  two's  complement  case. 

TKEY (4) . The  fourth  element  of  TKEY  holds  the  smallest 
(magnitude)  negative  floating-point  value  which  is  less  than  zero  and 
still  representable  on  the  computer  being  simulated.  Tf  any  result  is 
greater  than  TKEY (4)  and  less  than  zero,  negative  underflow  is  signaled 
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and  the  result  Is  replaced  by  zero.  If  the  binary  point  has  been 
specified  to  be  on  the  left, 

TKEY (4)  = -(2**(- (ITYPE*( (2**(IEXPNT-1) )-l)+ITYPE) ) ) (16) 

should  be  used  if  sign  plus  one’s  complement  arithmetic  has  been  speci- 
fied and 

TKEY (4)  = -(  (2**(  l TYPE) )+(2**(-MANTSA) ) ) *(2**(-(ITYPE* 

(2** (lEXPNT-1) ) ) ) ) (17) 

should  be  used  if  sign  plus  two’s  complement  arithmetic  has  been  speci- 
fied. For  the  cases  where  the  binary  point  has  been  specified  to  be  on 
the  right, 

TKEY(4)  = - ( 2 ** (- ( li YPE* ( ( 2*  * ( IEXPNT-1) ) -1) +ITYPE-MANTSA) ) ) (18) 

is  used  for  the  sign  plus  one's  complement  case  and 

TKEY (4)  = -((2**(-T7YPE))+(2**(-MANTSA)))*(2**(-(ITYPE* 

( 2 **  ( IEXPNT-- 1 ) ) -MANTS  A)  ) ) (19) 

is  used  for  the  sign  plus  two’s  complement  case. 

Function  Subroutines 

The  function  subroutines  are  called  to  perform  all  additions,  sub- 
tractions, multiplications,  divisions,  exponentiations,  and  assignments. 
The  subroutines,  together  with  the  operation  they  perform  and  the  types 
of  operands  they  have,  are  shown  on  page  47  of  the  thesis  by  Klein 
(Ref  37).  Although  several  changes  were  made  to  these  subroutines, 
their  basic  concept  remains  the  same.  The  first  change  is  that  they  all 
now  have  one  extra  parameter,  the  array  TKEY.  The  subroutines  which 
handle  only  fixed-point  operands  do  not  use  this  parameter,  but  more 
extensive  coding  changes  would  have  been  required  in  the  preprocessor 
to  differentiate  between  the  subroutines  which  have  at  least  one 
floating-point  operand  and  those  which  do  not.  The  second  change  was 


necessitated  when  the  user  option  to  handle  sign  plus  two's  complement 
arithmetic  was  added.  Separate  checks  are  made  for  positive  and 
negative  overflow  in  all  the  subroutines,  and  for  those  subrovitines 


having  at  least  one  floating-point  operand,  there  are  also  separate 
checks  for  positive  and  negative  underflow.  Hie  code  for  printing  ttie 
overflow  and  underflow  messages  was  put  into  the  function  subroutines, 
eliminating  the  small  print  subroutine  that  was  part  of  the  original 
simulator.  The  third  change  is  that  all  the  code  to  accomplish  the 
rounding  and  truncation  for  floating-point  operations  has  been  put  into 
three  special  subroutines. 

Special  Subroutines 

The  three  special  subroutines,  called  ROUNDER,  ONETRNC,  and  TWOTRNC, 
perform  rounding,  sign  plus  one's  complement  truncation,  and  sign  plus 
two’s  complement  truncation  on  floating-point  results.  The  documented 
code  for  these  subroutines  is  shown  in  Appendix  B.  The  option  which 
allows  the  user  to  specify  the  radix  of  the  machine  (ITYPE)  increased 
the  complexity  of  the  code  considerably.  The  exponent  must  be  examined 
before  determining  the  number  of  bits  to  save  in  the  mantissa,  since 
on  a machine  with  a radix  greater  than  two,  the  mantissa  might  bo  shifted 
one  or  more  places  to  the  right.  To  simulate  this  happening,  the  number 
of  places  the  mantissa  would  he  shifted  must  be  computed  and  then  sub- 
tracted from  the  number  of  mantissa  bits  to  save.  For  intermediate 
results,  the  number  of  guard  digits  Is  added  to  give  the  total  number 


of  mantissa  bits  to  save. 

ONETRNC.  The  subroutine  ONETRNC  performs  sign  plus  one's  comple- 
ment truncation  just  as  it  was  performed  in  the  original  simulator  sub- 
routine, except  It  has  added  code  to  hand lc  t he  machine  radix  option 
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and  the  guard  bits  option.  A step-by-step  example  of  bow  ONETRNC  works 
is  shown  below  in  Kig.  6.  This  example  shows  bow  the  result  of  1023 
divided  by  512  would  be  truncated  for  an  octal  machine  with  a word- 
length  of  12  bits,  7 of  which  are  mantissa.  There  is  1 guard  bit  and 
the  number  being  truncated  is  an  intermediate  result. 

a)  0 0011111  1 0001 

b)  17207774000000000000 

c)  00000000000000001720 

d)  00000000000000172077 

e)  17207700000000000000 

Fig.  6.  ONETRNC  Examples 

Line  (a) , which  is  in  binary,  shows  what  the  result  would  look  like  on 
the  simulated  machine.  Line  (b)  shows  what  the  answer  looks  like  on  the 
CDC  CYBER  74  computer  before  being  manipulated  by  ONETRNC.  Line  (c) 
shows  the  first  12  bits  of  the  CDC  CYBER  74  word  after  they  have  been 
shifted  right  48  places,  putting  the  exponent  into  fixed-point  position. 
The  word  has  been  filled  with  the  mantissa  sign  during  the  shift,  and 
following  the  shift,  if  the  result  is  negative,  it  is  complemented.  The 
value  at  this  point  represents  the  biased  exponent  of  the  CDC  CYBER  74 
word.  If  the  sign  of  the  exponent  and  mantissa  are  the  same,  then  the 
exponent  is  subtracted  from  2056,  otherwise  the  exponent  is  subtracted 
from  2055.  This  subtraction  allows  the  FORTRAN  MOD  function  to  compute 
the  number  of  extra  mantissa  shifts  required  to  compensate  for  the 
machine  radix.  The  MOD  function  computes  the  remainder  of  this  dif- 
ference divided  by  the  base  two  logarithm  of  the  machine  radix,  or  ITYPE, 
which  is  stored  in  KEY  (8).  In  this  example,  the  exponent  (976)  was 
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subtracted  from  2055,  giving  1079,  and  I TYPE  is  3,  so  the  number  of 
extra  bits  lost  is  2,  which  is  also  the  number  of  leading  zeros  in  the 
mantissa  in  line  (a).  The  number  of  bits  to  truncate  for  intermediate 
results  is  found  by  subtracting  the  number  of  mantissa  bits  and  guard 
bits  to  save  from  48,  then  adding  the  number  to  lose  due  to  the  machine 
radix.  For  this  example,  the  final  number  of  bits  to  lose  is  48-(7+l)+2, 
or  42  bits,  leaving  6 significant  bits  in  the  CPC  CYBER  74  mantissa. 

The  CDC  CYBER  74  word  is  then  shifted  42  places,  first  to  the  right  with 
a sign  extension  fill  (line  d)  and  then  to  the  left  witli  circular  fill 
(line  e).  The  result  has  the  same  numerical  value  as  line  (a). 

TWPTRNC.  TWOTRNC  works  just  like  ONETRNC  except  that  for  negative 
numbers,  TWOTFNC  truncates  them  away  from  zero  instead  of  toward  zero. 
This  is  done  by  adding  to  the  negative  number  a negative  number  which 
is  composed  of  the  same  exponent  value  and  an  unnormalized  mantissa 
which  may  be  computed  by  treating  the  mantissa  like  a fractional  value 
and  subtracting  the  smallest  non-zero  simulated  mantissa  from  the 
smallest  nonzero  unnormalized  CPC  CYBER  74  mant issa.  The  example  in 
Fig.  7 shows  the  step-by-step  two's  complement  truncation  of  the  result 
of  dividing  -1023  by  512  on  a binary  machine  with  a 12-hit  wordlcngth. 

The  mantissa  length  is  7 bits,  there  are  no  guard  bits,  and  the  binary 
point  is  on  the  left,  l.ine  (a),  which  is  in  binary,  shows  what  the 
result  should  look  like  on  the  simulated  computer.  Line  (b)  shows  what 
the  result  looks  like  on  the  CDC  computer  before  being  manipulated  by 
TWOTRNC.  Line  (c)  shows  the  unnormalized  number  which  is  to  be  added 
to  the  result.  Line  (d)  shows  the  result  of  this  addition.  The  next 
step  is  to  perform  the  shifting  just  as  in  ONETRNC.  Line  (e)  shows  the 


result  after  being  shifted  right  41  places,  and  line  (O  shows  the  final 


a) 

1 0111111  0010 

b) 

60570003777777777777 

c) 

60577740000000000000 

d) 

60563762000000000000 

e) 

77777777777777413477 

f) 

60563777777777777777 

Fig.  7.  TWOTRNC  Examples 

result  after  being  shifted  left  41  places.  The  values  in  line  (f)  and 
line  (a)  are  the  same. 

ROUNDER.  The  subroutine  ROUNDER  has  been  changed  from  the  one  in 
the  original  simulator  in  that  now  it  handles  all  rounding  which  needs 
to  be  done.  Rounding  is  accomplished  in  the  same  manner  for  both  sign 
plus  one's  complement  and  sign  plus  two's  complement  arithmetic,  so  no 
distinction  is  made  between  the  two.  ROUNDER  works  the  same  as  ONETRNC 
and  TWOTRNC  in  the  way  that  the  machine  radix  and  guard  digits  are 
handled.  It  also  performs  a sign  plus  one's  complement  truncation  as 
the  last  step,  just  as  the  others  do.  Before  the  final  shifts  are  per- 
formed, however,  ROUNDER  rounds  by  adding  to  the  operand  an  unnormalized 
number  which  has  the  same  sign  and  exponent  as  the  operand.  The  man- 
tissa of  this  number  is  computed  by  putting  a 1 one  place  to  the  right 
of  the  computed  end  of  the  simulated  mantissa  and  zeros  elsewhere.  For 
a negative  number,  this  mantissa  is  then  complemented.  Fig.  8 shows  an 
example  of  how  the  result  of  dividing  1023  by  512  is  rounded  to  a 7-bit 
mantissa.  No  guard  digits  are  used,  and  the  machine  is  hexadecimal. 

Line  (a)  shows  what  the  result  would  look  like  on  the  simulated  computer, 
and  line  (b)  shows  what  the  result  looks  like  on  the  CDC  CYBER  74  before 


a)  0 0010000  0001 

b)  17207774000000000000 

c)  17200200000000000000 

d)  17214076000000000000 

e)  00000000000000036430 

f)  17214000000000000000 

Fig.  8.  ROUNDER  Examples 

being  manipulated.  Line  (c)  shows  the  unnormalized  number  which  is 
added  to  the  operand  before  truncation  occurs.  In  this  case,  only  4 
mantissa  bits  will  be  saved,  since  3 are  lost  because  of  the  machine 
radix.  Line  (d)  shows  the  result  of  the  addition,  with  the  CDC  handling 
the  special  case  where  the  exponent  has  been  changed.  Line  (e)  shoves 
the.  result  of  the  left  shift  of  44  places,  and  line  (f)  shows  the  final 
result,  which  is  the  same  as  line  (a). 

Summary 

Several  changes  were  made  to  the  original  n-bit  simulator  to  enable 
it  to  simulate  more  accurately  various  computer  wordlengths.  The  pre- 
processor was  modified  so  that  it  builds  an  extra  argument  into  the 
function  subroutine  calls  that  it  places  in  the  modified  FORTRAN  source 
program.  This  extra  argument,  TKEY,  is  the  name  of  a floating-point 
array  which  holds  the  floating-point  overflow  and  underflow  limits  of 
the  simulated  computer.  The  subroutine  SETNBIT  was  changed  to  give  the 
user  more  flexibility  in  simulating  different  computers.  Options  were 
added  to  allow  the  user  to  differentiate  between  sign  plus  one's  com- 
plement and  sign  plus  two's  complement  machines,  to  specify  the  radix 
of  the  machine  being  simulated,  to  specify  different  floating-point  and 
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f ixed-point  word  lengths,  and  to  specify  the  presence  of  guard  hits.  The 
Insertion  of  the  guard-hit  opt  ion  eliminated  the  need  for  the  option  in 
the  original  simulator  which  allowed  the  user  to  specify  whether  compu- 
tations were  performed  in  single,  double,  or  triple  precision.  The 
function  subroutines  were  changed  so  that  all  overflow  and  underflow 
checks  and  messages  are  handled  in  the  subroutines,  and  all  rounding 
and  truncation  is  done  by  three  special  routines  called  ONKTRNC,  TVOTRNC, 
and  ROUNOKR.  Creating  those  special  routines  increased  the  execution 
time  of  the  simulator,  but  at  the  same  time  it  allowed  tor  increased 
system  reliability  and  maintainability. 
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III  Monte  Carlo  Testing 

When  evaluating  numerical  software,  one  find;;  a direct  relationship 
between  the  effort  expended  in  testing  and  the  confidence  in  the  correct- 
ness of  the  software.  One  way  to  assure  correctness  is  to  test  all  pos- 
sible combinations  of  inputs.  Since  this  is  rarely  practical,  a common 
practice  is  to  use  a limited  set  of  input  data  for  testing.  In  this 
chapter  are  discussed  techniques  for  testing  pseudo-random  number  gen- 
erators and  Monte  Carlo  techniques  for  testing  numerical  software.  Also 
mentioned  are  testing  methods  proposed  by  Cody  (Ref  10)  and  others  to 
increase  the  probability  that  numerical  accuracy  problems  in  a given 
numerical  routine  will  be  discovered. 

Generat ing  Pseudo- Random  Samples 

Only  binary  computers  which  perform  two's  complement  arithmetic  are 
considered  in  this  chapter,  and  it  is  assumed  that  floating-point  man- 
tissas are  normalized.  Floating-point  numbers  are  represented  with  man- 
tissas of  length  m bits  and  exponents  of  length  e bits.  Since  the  man- 
tissas are  normalized,  there  are  only  2m_3  unique  mantissa  values. 

There  is  one  extra  mantissa  representation  which  is  used  for  the  value 
0 which  is  not  normalized,  since  the  mantissa  is  all  zeros.  Not  count- 
ing the  right  endpoint  of  the  interval  jo.0,1. oj  , there  are  K unique 
numbers  expressable  where  K may  be  computed  by 

K = (2m-1)*(23_1+l)+l  (20) 

For  the  AN/AYK-15A  computer,  the  mantissa  length  is  23  bits  and  the 
exponent  length  is  8 bits.  Therefore,  from  (20),  K is 

K = (223_1)*(28_1+1)+1  = 541,065,217  (21) 

Thus,  when  testing  even  a simple  routine  such  as  a sine  routine  for 
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sin(7rX/2)  over  the  interval  Jo.O.l.oj,  it  becomes  impractical  to  test 
using  all  possible  inputs.  Therefore,  Monte  Carlo  techniques  arc  often 
employed. 

When  taking  samples  of  input  data,  Cody  (Kef  30)  states  that  the 
input  interval  can  be  divided  into  subintervals,  with  a collection  of 
random  bit  patterns  tested  in  each  subinterval.  Not  all  numbers  in  his 
subintervals  necessarily  have  the  same  exponent.  For  any  of  his  sub- 
intervals containing  numbers  for  which  the  error  varies  substantially , 
the  sub intervals  are  broken  down  further  with  more  random  bit  patterns 
being  chosen  from  each  new  subintorval. 

If  a subinterval  is  considered  to  contain  all  representable  numbers 
with  the  same  exponent,  then  the  numbers  are  evenly  spaced  over  the  sub- 
interval.  This  docs  not  mean  that  one  can  take  uniform  random  samples 


from  the  interval 


[o.O.l.oj  a 


and  expect  to  get  an  unbiased  random  samp- 


ling of  all  possible  values  occurring  in  that  interval.  If  so,  half  of 
the  samples  would  be  expected  to  lie  in  the  interval  jo . 5 , 1 . Clj  . These 
numbers  have  an  exponent  of  0 and  are  uniformly-distributed  over  the 
subinterval  £o.5,l.oj.  For  the  AN/AYK-15A  computer,  there  are  129 

g_  ] 

(2  +1  where  e=8)  different  exponents  which,  when  combined  with  man- 

tissas, produce  numbers  in  the  interval  Jo.O.l.oJ.  Therefore,  less 
than  one  percent  of  all  representable  numbers  in  the  interval  |o.O,l.oj 
actually  lie  in  the  subinterval  £o.5,l.oj. 

Cody  recommends  using  random  bit  patterns,  and  these  can  be  ob- 
tained easily  using  the  CDC  pseudo-rardom  number  generator.  The  pseudo- 
random number  generator  returns  values  which  arc  uniformly-distributed 


over  the  interval 


al  jo.O,l.ol 


Those  numbers  less  than  0.5  can  be  modi- 


fied by 
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S<  1 . o-s 


(22) 


with  the  resulting  numbers  being  uniformly-distributed  over  the  sub- 
interval  £o.5,l.oj.  The  exponent  can  then  bo  modified  to  provide  a 
uniformly- d istributed  pseudo-random  sample  from  any  of  the  129  subinter- 
vals making  up  the  interval  |o.O,l.oj.  Since  all  subintervals  contain 
uniformly-distributed  numbers,  sampling  from  each  subinterval  equally 
helps  ensure  that  the  total  sample  is  close  to  being  uniformly-distri- 
buted over  the  entire  population  of  K representable  numbers.  Although 
this  method  is  not  as  random  as  if  the  exponents  were  also  generated 
randomly,  it  is  easier  to  use  when  testing  numerical  software  with  the 
n-bit  simulator,  since  it  is  easier  to  construct  the  sample  numbers.  In 
this  investigation,  all  pseudo-random  samples  were  generated  by  combin- 
ing a pseudo-random  normalized  mantissa  (from  the  subinterval  ^0.5,l.oj) 
with  a pseudo-random  exponent  which  was  uniformly-distributed  over  the 
population  of  possible  exponents.  Resulting  samples  which  fell  outside 
the  variable  ranges  were  discarded. 

Test ing  of  Pseudo-Random  Number  Generators 

Three  considerations  play  important  roles  in  determining  whether  or 
not  a particular  source  provides  uniformly-distributed  random  or  pseudo- 
random numbers  which  are  adequate  for  use  in  testing.  First,  the  numbers 
must  be  able  to  pass  statistical  tests  which  reveal  departures  from  uni- 
formity and  independence.  Second,  the  numbers  must  be  sufficiently  dense 
over  the  interval  being  used,  which  in  this  case  is  the  subinterval 
£o.5,l.oj.  Third,  the  numbers  should  be  able  to  be  produced  efficiently. 
Since  these  three  properties  rarely  characterize  any  one  method  of  pro- 
ducing random  numbers,  compromises  arc  made.  Uniformity  and  independence 
are  generally  more  important  than  density  or  efficiency  in  determining 
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the  Adequacy  of  any  particular  method  (Ref  19:169-170).  The  numbers 
returned  by  the.  CPC  pseudo-random  number  generator  are  uniformly-dis- 
tributed over  the  interval  £q.0,1.qJ  and  can  be  shown  to  pass  the  pair 
triplet  test,  the  auto-correlation  test  with  lag  ^100,  and  one  of  the 
most  powerful  tests,  the  spectral  test  formulated  by  R.  R.  Conveyou  and 
R.  1).  MacPherson  (Ref  38:82  and  1 A : 82 ) . The  CPC  pseudo-random  number 
generator,  when  used  in  conjunction  with  the  n-bit  simulator  and  equa- 
tion (22),  can  be  used  to  efficiently  produce  numbers  which  are  suffic- 
iently dense  over  the  interval  £o.  5 , .1 . oj  . There  are  many  tests  designed 
to  reveal  departures  from  independence  and  a uniform  distribution. 

Knuth  (Ref  38)  describes  ten  of  them  with  algorithms,  and  other  tests 
can  be  found  in  references  34,  45,  and  54.  The  tests  discussed  here 
represent  those  that  were  used  to  test  the  CPC  pseudo-random  number  gen- 
erator when  used  in  conjunction  with  the  n-bit  simulator.  Tests  were 
conducted  for  a uniform  distribution,  randomness,  and  correlation. 

Uniform  Pistrihut ion.  The  Kolmogorov-Smi rnov  test  (Kef  19:187-188) 
was  used  to  test  whether  or  not  pseudo-random  numbers  were  uniformly- 
distributed  over  the  interval  £o.5,1.oj.  The  sample  cumulative  distri- 
bution function  is 


0 X < X. 


V»  'in  X|iX<Xl+l  

1 X > X 
— n 


(23) 


and  the  theoretical  cumulative  distribution  function  is 

/ 


F(X) 


0 X < 0.5 

2X-1  0.5  < X < 1.0 

1 X > 1.0 


(24) 


The  tost  statistic  P (X)  is 

n 
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(25) 


Dn(X)  = max  | F(X)  - F^CX) | (25) 

The  null  hypothesis  that  the  samples  are  taken  from  a uniform  distribu- 
tion may  be  rejected  with  a confidence  of  1-a  if  I)  (X)  > D and  if 

n — a. 

D^(X)  < D^,  the  null  hypothesis  cannot  be  rejected.  For  a ~ 0.10,  0.05, 
and  0.01,  the  critical  II  values  were  computed  using  equations  (26), 


(27),  and  (28). 


.10 


.05 


1.22 


1.36 


1.63 


(26) 

(27) 

(28) 


. 01  /n 

The  results  of  Kolmogorov-Smi rnov  tests  of  pseudo-random  numbers 
are  shown  in  Table  1.  Columns  1 and  2 show  what  the  numbers  were  used 
for,  column  3 shows  the.  number  of  samples  generated,  and  column  4 shows 
the  computed  values.  These  can  be  compared  to  values  shown  in  col- 
umns 5,  6,  and  7 which  show  the  critical  D values  used  to  test  with  a 

a 

level  of  significance  ft.  These  numbers  tested  were  used  in  evaluating 
the  square  root  and  sine  function  approximations  which  are  discussed  in 
the  next  chapter.  When  testing,  odd-numbered  calls  to  the  pseudo-random 
number  generator  were  used  to  generate  the  mantissa  and  the  even-num- 
bered calls  were  used  to  generate  the  exponents.  The  mantissas  were 
modified,  if  needed,  by  using  equation  (22)  and  were  then  truncated  to 
contain  23  significant  bits,  which  is  the  single-precision  mantissa 
length  of  the  AN/AYK-15A  flight  computer.  The  numbers  used  to  generate 
the  exponents  were  tested  as  returned  by  the  pseudo-random  number  gen- 
erator. The  reasons  for  choosing  a particular  number  of  samples  to  use 
will  be  explained  later  in  this  chapter  and  in  the  next  chapter. 
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USE 

NUMBER  OF 
SAMPLES 

Vx) 

VlO 

Square 

Mantissa 

600 

0.0414 

0.0498 

Root 

Exponent 

600 

0.0307 

0.0498 

Sine 

Mantissa 

25307 

0.00470 

0.00767 

Cosine 

Exponent 

25307 

0.00549 

0.00767 

Table.  1.  Results  of  Kolmogorov- Smirnov  Tost? 


Independence.  Randomness  and  correlation  should  be  tested  for  when 
evaluating  the  independence  of  random  numbers.  Randomness  for  single 
numbers  was  tested  by  vising  the  runs  tost. 

Runs  Test.  The  runs  test  (Ref  47:282-285)  is  based  on  the 
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thus  n mul  n can  be  computed  by 

J - / 


n2  ■>  n2  - < 


N/2  N even 

(N-l)/2  N odd 

With  this  assumption,  p and  o can  be  computed  by 

(N+2)/2  N even 

|(N+l)/2  N odd 


(3?) 


Vi 


< 


(33) 


fN(_N-2) 

4(N-i) 


N even 


U I /(N-1)(N-32  N 

V 4(N-2)"'  N odd 


(34) 


The  null  hypothesis  that  the  samples  are  random  is  rejected  with  a con- 
fidence of  1-ot  if  z falls  outside  the  confidence  interval  £-z  . 

Results  of  tests  using  the  same  pseudo- random  numbers  used  for  the 
Kolmogorov-Smirnov  test  are  shown  in  Table  2.  Columns  1 and  2 show  what 
t he  numbers  were  used  for  and  column  3 shows  the  number  of  samples  gen- 
erated. Columns  4 and  3 show  the  values  of  the  expected  mean  p and  the 

u 


standard  deviation  o 


Column  6 shows  the  actual  number  of  runs  and 


column  7 shows  the  values  of  the  z statistics  computed  using  equation 
(31).  These  can  be  compared  to  the  values  of  (two-tailed  test)  for 

ci  “ 0.1,  0.03,  and  0.01. 

Serial  Correlation  Test.  The  serial  correlation  test  (Ref 
38:64-63)  measures  the  amount  that  U depends  on  U , where  U is  an 

.1  * J 

individual  sample.  The  serial  correlation  coefficient  C is  a statistic 
which  always  lies  between  -1  and  +1  and  is  computed  by 


■<WW-  • ^i'AiV-W  • -+V11 


n("ot,,i+'  • •+ui!-i,"<  w-  • •+un-i>2 

When  C lies  close  to  zero,  then  any  two  consecutive  samples  are  rela- 
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lively  independent  of  each  other,  hut  when  C is  U it  indicates  a total 


linear 

a and  m 

It  +2o 
n n 


dependence,  i.e,  li  - m + nUj  ^ for  all  j and  for  some  constants 

. "Good"  values  of  C are  expected  to  lie  between  )i  -2o  and 

n n 

about  percent  of  the  time,  where  u and  o are  computed  bv 

n n 


_ -1 
Un  (n-i) 


(3ft) 


•.  - -shr V^r1  (37> 

These  two  equations  are  only  conjectured  by  Knuth  but  are  supported  by 

empirical  evidence  (Kef  38:ft4-ft!>).  Successive  mantissas  and  successive 

exponents  were  tested  separately  for  correlation.  Table  3 shows  the 

results  of  tests  ustup,  the  same  pseudo-random  numbers  used  for  the  Kol- 

mogorov-Smi mov  tost.  Columns  1 and  2 show  the  use  of  numbers  and 

column  3 slvows  the  number  of  samples  generated.  Column  4 and  S show 

the  values  of  the  expected  mean  it  and  standard  deviation  o . Column  ft 

n n 

shows  the  values  of  )i  -2o  , column  7 shows  the  values  of  the  C statistics 

n n 

computed  using  equation  (3S),  and  column  8 shows  the  values  of  p^+20  . 
Terminal  ion  Cr  itoria  for  Monte  Carlo  Tost  ing 

Exhaustive  testing  is  prohibitive  in  time  and  cost  in  all  but  some 
special  cases.  The  tester,  if  he  decides  to  test  using  random  input 
data,  is  therefore  faced  with  a decision  on  when  to  stop  testing.  It  is 
assumed  that  the  software,  in  the  state  in  which  it  is  being  tested,  will 
be  rejected  if  any  error  is  encountered.  When  testing  numerical  proper- 
ties, an  error  occurs  whenever  the  relative  or  absolute  error  exceeds 
specified  tolerance  limits.  In  this  sense,  testing  stops  when  the  first 
error  occurs.  A tost  Is  considered  to  be  the  act  of  select ing  a random 
sample  and  then  determining  if  that  sample  produces  an  error  or  not.  In 
this  sense,  all  tests  are  assumed  to  he  independent.  For  each  test, 
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there  are  only  two  possible  outcomes 


an  error  occurs,  or  no  error 


occurs.  An  error  occurrence  is  considered  to  be  a success  and  no  error 
occurrence  is  considered  to  be  a failure.  It  is  also  assumed  that  the 
probability  of  a success,  p,  is  the  same  for  all  trials,  or  tests,  since 
the  test  includes  the  selection  of  the  random  sample.  If  the  first  suc- 
cess occurs  on  the  xth  trial,  it  must  be  preceded  by  x- 1 failures.  The 

x—  \ « 

probability  of  x-1  failures  is  (1-p)  , and  if  multiplied  by  the  pro- 

bability of  a success  on  the  xth  trial,  p,  the  probability  of  getting 
the  first  success  on  the  xth  trial  is  obtained  as  shown  in  equation  (38) 

(Ref  47:54-55,83) . 

8(x;p)  = p(l-p)X  1 (38) 

This  is  the  geometric  distribution , and  for  the  purposes  of  this  inves- 
tigation, it  is  assumed  that  the  testing  of  numerical • software  will 
follow  this  distribution. 

If  no  errors  occur,  the  tester  is  faced  with  the  decision  of  when 
to  stop  testing.  If  it  is  assumed  that  the  software  will  not  be  rejected 
unless  an  error  is  actually  detected,  then  the  producer  assumes  no  risk, 
and  therefore  the  probability  of  a Type  1 error  is  zero.  The  consumer’s 
actual  risk  after  n trials  is  the  probability  that  the  software  was  not 
rejected  if  it  contained  errors.  For  the  software  to  have  not  been  re- 
jected, all  n trials  would  have  to  produce  failures.  Since  the  proba- 
bility of  a failure  on  any  one  trial  is  1-p,  the  probability  of  not 
rejecting  the  software  given  n trials  is  (l-p)n.  The  probability  of  a 
Type  II  error,  8,  is  specified  by  the  consumer  (or  tester)  and  is  the 
maximum  risk  the  tester  is  w’illing  to  take  of  accepting  bad  software. 

To  ensure  that  the  tester's  actual  risk  is  less  than  fl,  n must  be  chosen 
large  enough  so  that  (l-p)n<P.  Solving  for  n gives 

I 

V 

a 
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- In(l-p) 


(39) 


To  got  a value  for  p,  some  assumptions  must  bo  made  about  numorical 
errors  in  software.  Tbo  following  example  is  presented  to  show  why 
assumptions  might  be  made  to  compute  p.  The  AN/AYK-15A  computer  with 
its  23-bit  mantissa  is  vised  in  this  example.  If  the  function  sin(nX/2) 
is  being  evaluated  over  the  interval  |o.5,l.oJ,  then  the  exponent  of  all 

possible  inputs  is  always  0,  and  the  values  of  X are  uniformly  distri- 

. 22 
buted  across  the  interval.  However,  there  are  2 , or  4,194,304  of 

those  values.  If  the  software  is  rejected  with  only  one  error  (and  it 
is  assumed  that  all  the  rest  of  the  inputs  do  not  produce  errors),  then 
p is  the  percent  of  the  input  tested  defective  to  be  used  as  the  rejec- 
tion criterion  and  may  be  computed  by 


1__  = 1__ 

922  = 4194304 


0. 2384*10 


-6 


(40) 


If  the  tester  wishes  to  be  95  percent  sure  he  correctly  identifies  a 
piece  of  software  for  rejection,  then  3 is  0.05.  Using  these  values, 
the  tester  would  have  to  conduct  nearly  three  times  as  many  tests  with 
random  samples  as  there  are  input  possibilities.  The  value  of  n is 
computed  by 


log  P -1.30103  ^ 

log  (1-p)  -0.10 354* 10"6  0*1-'65  10 


(41) 


This  says  it  would  be  easier  to  conduct  an  exhaustive  test.  The  problem 
lies  in  the  assumption  of  the  single  error  found  among  all  possible  in- 
puts. It  seems  highly  unlikely  that  one  value  would  exceed  t lie  error 
tolerances  without  the  two  numbers  adjacent  to  it  also  exhibiting  sim- 
ilar characteristics.  The  problem  becomes  one  of  specifying  a realistic 
value  for  p. 
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A simple  case  of  cancellation  of  terms  is  presented  to  show  how 

relative  errors  large  enough  to  cause  software  rejection  can  occur.  In 

this  example,  the  binary  machine  truncates,  the  mantissa  length  is  m, 

-k 

and  the  relative  error  bound  is  specified  to  be  2 . If,  for  example, 

the  relative  error  bound  is  specified  to  be  10  then  k = 5(ln  10)/ 

(In  2),  or  approximately  16.61.  The  variable  Y is  computed  as  shown  in 
equation  (42),  and  the  constant  C,  which  has  an  exponent  e,  is  assumed 
to  have  a relative  error  of  2 m. 

Y = C - X (42) 

For  all  values  of  X within  a distance  I)  of  C where  D is  computed  by 

D - (2"m+k"1)*(2C)  (43) 

at  least  the  (ra-k+1)  most  significant  bits  of  the  mantissa  are  lost  in 
the  subtraction,  leaving  at  most  k-1  significant  bits.  If  it  is  assumed 
that  all  values  of  X which  fall  within  a distance  D of  C have  the  same 
exponent  as  C,  then  the  number  of  unique  values  of  X which  He  within  a 
distance  D of  C is  computed  to  be  Z by 

Z = 1 + 2*(2k_1-l)  « 2k-l  (44) 

Since  the  total  number  of  possibilities  for  X (with  the  same  exponent  as 

Tfl — \ 

C)  is  2 , the  portion  of  the  subinterval  which  causes  rejection  is  R. 

R = Z/(2™“1)  = (45) 

2m-'' 

For  the  case  of  the  AN/AYK-15A  computer,  if  k lias  been  specified  to  be 
17  (giving  10  ^ jrQ^ativc  accuracy),  then  R may  be  computed  by 

217-1  -5 

R - ~~  » 2 = 0.03125  (46) 

2 

This  value  of  R says  that  there  is  a block  of  consecutive  numbers  cen- 
tered about  C in  one  subinterval  which  cause  a relative  error  greater 
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than  2~^,  and  that  those  numbers  comprise  over  three  percent  of  the 


numbers  in  that  subinterval. 

In  actual  situations,  it  cannot  be  assumed  that  all  the  numbers 
within  a distance  D of  C have  the  same  exponent  as  C,  so  Z'  must  be 
computed.  If  the  exponent  of  C is  neither  the  maximum  positive  nor 
negative  exponent  of  the  computer  being  simulated,  then  for  binary  com- 
puters, Z'  falls  iu  the  range  J^3Z/4, 3Z/l’j  . If  the  exponent  is  the  maxi- 
mum positive  value,  then  Z'  falls  in  the  range  ^2/2, 3Z/zj  , and  if  the 
exponent  Is  the  maximum  negative  value,  then  Z*  falls  in  the  range 
fz/2,Zj  . In  these  special  cases,  overflow  and  underflow  combined  with 
the  normalization  of  the  mantissa  can  limit  the  number  of  unique  values 
within  a distance  0 of  C,  so  7.'  can  take  on  smaller  values.  If  R*  is 
computed  as  shown  in  equation  (47),  then  R*  gives  a minimum  proportion 
of  consecutive  numbers  in  the  same  subinterval  as  the  value  C which 


cause  too  large  a relative  error. 

R*  - min(Z')/2n’-1  = |/2ra_1  = |m 


(2k-l)*(2-m) 


If  the  gross  assumption  is  made  that  relative  errors  1)  propagate  through 
n piece  of  software,  2)  generally  don't  diminish,  and  3)  are  caused  by 
cancellation  of  terms,  then  R*  can  be  used  to  approximate  p in  equation 


» lnft 

n ± Tnfl^p)  (39) 

If  the  tester  is  testing  over  w subintervals,  then  the  probability 
of  getting  a success  is  p/w  and  equation  (39)  can  be  rewritten  to  re- 
flect the  testing  over  the  entire  variable  range. 


In ( l-1 ) 
w 


— 


When  a multi-variate  function  is  being  evaluated,  another  assump- 
tion is  made  concerning  the  proportions  of  different  variables  which 
can  cause  a success  (an  error  larger  than  the  specified  limits)  to  occur. 
It  is  assumed  that  a certain  proportion  of  each  variable  cause  success 
and  that  these  proportions  are  independent  of  the  other  variables.  Cer- 
tain combinations  of  input  values,  none  of  which  alone  causes  a success, 
can  together  also  cause  a success  to  occur.  However,  these  cases  were 
not  considered  in  arriving  at  the  number  of  random  x trials  needed, 
since  they  would  tend  to  decrease  the  number  of  trials  required.  As  in 
the  single-variate  case,  each  variable  will  be  sampled  from  its  total 


number  of  possibilities  using  pseudo-random  numbers  to  generate  the  man- 


. th 


tissa  and  exponent.  The  proportion  of  all  possible  values  of  the  in- 


variable which  cause  success  is  represented  by  p.  witli  the  range  of  the 


. th 


i — variable  being  broken  up  into  w^.  subintervals.  The  total  number  of 


variables  is  denoted  by  V.  The  proportion  P of  all  possible  multi- 
variate inputs  is  computed  by 


V p,  V p p. 

P = l — - l -1  + CT 
i=l  Wi  i,j  wi  wj 


(49) 


where  0 denotes  terms  of  order  three  or  higher. 

The  number  of  input  samples  (n)  which  guarantee  that  the  consumer's  risk 
is  not  greater  than  3 can  then  be  obtained  using 


n s InB 

- ln(l-P) 


(30) 


This  still  does  not  account  for  gradual  error  buildup  in  which  a much 
smaller  proportion  of  a subinterval  produces  a relative  error  which  may 
be  only  one  bit  short  of  the  required  relative  accuracy.  In  these  cases, 
the  error  buildup  can  usually  be  noticed  using  error  plots,  since  many 
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other  numbers  might  provide  the  required  relative  accuracy  with  as  few 
as  zero  or  one  bits  to  spare.  Cody  (Ref  10)  suggest  that  when  this 
occurs  and  it  appears  likely  that  the  software  will  fail,  more  extensive 
testing  can  be  conducted  over  the  appropriate  subintervals  in  question. 

Anomclies  other  than  cancellation  of  terms  can  occur  in  numerical 
software.  Cody  (Ref  10)  also  suggests  that  special  inputs  be  constructed 
which  will  check  the  boundaries  of  subintervals,  boundaries  of  the  input 
variable,  and  any  cross-over  and  neighboring  data  points  where  the  rou- 
tine changes  algorithms.  These  cases  are  difficult  to  construct  and 
must  currently  be  done  manually.  When  used  in  conjunction  with  random 
testing  using  sub intervals , they  increase  the  tester's  confidence  that  a 
numerical  software  routine  performs  correctly. 

Summary 

A tester  using  Monte  Carlo  techniques  to  test  software  is  faced 
with  the  question  of  when  to  stop  testing.  His  maximum  risk,  3,  is 
specified,  but  he  may  know  very  little  about  the  error  characteristics 
of  the  software.  By  making  an  assumption  about  the  nature  of  the  error 
characteristics,  the  tester  can  arrive  at  a number  of  trials  at  which  to 
stop  testing.  The  tester's  confidence  in  the  results  (assuming  the 
tester  doesn't  get  any  successes  using  random  inputs)  can  be  increased 
by  conducting  extra  tests  at  the  boundaries  of  the  variables  and  other 
special  points,  by  using  error  plots,  and  by  monitoring  the  error  devi- 
ations over  small  subintervals. 


IV  Kv.'i  1 \ifit  (on  ol  Avion  I ok  Mat  hemal  leal  Kent  (non 

M;my  inertial  navi  gal  (on  ami  I ( re  control  rout  (non  require  (lut l 
cosine,  aim',  at  cl  augonl  , ami  square  root  functions  bo  ovaluat  ed.  Two 
haaio  methods  won'  eons  I doled  lor  simulating  I bo  I r I gonomot  r I o functions 
ami  t bo  square  i oot  lunotlon  wbon  tboy  avo  required  by  a simulated  I light 
rout  1 no,  Tbo  ('.out  ml  Hal  a Corporation  (CMC)  library  rout  Inos  could  be 
used,  with  I bo  simulator  adjusting  tbo  lunotlon  valuo  returned,  or  now 
lout  lues  oouUI  bo  eonst  mot  oil  whloh  wouUI  thou  bo  modified  bv  tin'  n bit 
.simulator.  Tlioao  modified  rouiluoa  would  thou  bo  compiled  along  w t ( 1 1 
tbo  programs  whloh  call  thorn,  thereby  more  aocuiately  rel  loot  lug  t lu> 
aoouraoy  obtainable  on  the  simulated  maohlne.  Tbo  sine,  cosine,  arc- 
tangent, and  square  root  functions  were  all  evaluated  with  the  trans- 
mitted error  being  removed  (Uol  ll:l?'l).  The  arctangent  lunotlon,  when 
required  by  a simulated  flight  routine,  was  evaluated  using  the  CMC 
library  routine  with  the  result  being  truncated  by  the  n-blt  simulator. 
Other  arctangent  approximations  wore  not  analyzed.  Several  have  been 
proposed  by  Hart  (ltd  12).  The  algorithms  presented  In  this  chapter  lor 
the  square  root,  sine,  and  cosine  I unctions  are  "optimal"  lot  the  AN/ 

AYK  ISA  Might  computer  In  the  sense  that  they  provide  the  tequlrod 
accuracy  specified  by  reference  S7  while  requiting  a minimum  number  of 
computer  instructions  for  implement  tit  Ion.  Othet  npproximat  Ions  whloh 
were  analyzed  are  presented  in  Appendix  C.  All  stored  constants  have 
been  chosen  t o minimize  the  relative  error,  since  this  It;  the  same  as 
maximizing  the  number  ol  correct  mantissa  bits  (mantissa  normalized), 
lit  evaluating  these  functions,  the  absolute  error,  AM,  1st  defined  as 
the  difference  between  t lit'  approximated  solution,  W,  and  the  exact  sol 
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ut  1 on , Y . 


AE  - W - Y 


(51) 


The  relative  error,  RE,  is  defined  to  t>e  the  absolute  error  divided  by 
the  exact  solution  and  Is  not  defined  when  the  exact  solution  equals 
zero  (Kef  18:  !>)  ; 

RE  “ AE/Y  ~ (W  - Y)/Y  (Y  / 0)  (52) 

For  the  plots  shown  In  this  chapter,  the  horizontal  axes  are  used  to 
represent  the  input  arguments  and  the  vertical  axes  are  used  to  represent 
either  the  absolute  error,  AE,  or  tin-  relative  error,  RE, 


Square  Root 


For  the-  square  root  function,  several  approximations  were  analyzed 
in  addition  to  that  of  truncation  the  CI1C  result  with  the  n-blt  simula- 
tor. The  two  solutions  proposed  both  use  Newton's  iterative  method  for 
evaluating  /X  where  X > 0 (Ref  18:23).  Newton's  method  Is  to  choose  an 
initial  approximation  Yq  and  then  compute  Yj,  Y->,  Yj,  . . . defined  by 
the  recursion  formula 


k«  1 


- (Y t + v )‘  k “ °*  !»  2* 
* k k 


(53) 


Convergence  Is  quadratic,  meaning  that  as  Y^  gets  sufficiently  close  to 
/x,  Y^^j  has  approximately  twice  as  many  correct  digits  as  Y^.  Range 
reduction  schemes  are  commonly  employed  to  reduce  t he  maximum  relative 
error  (n  the  first  approximation. 

Spec l f Icat ions.  Specifications  for  the  square  root  function  Include 
the  following: 

1)  that  it  not  call  other  subroutines  (Ref  23:58  and  57:149) 

2)  that  the  maximum  output  would  be  327(>.  7 (Ref  57:92) 

3)  that  the  maximum  error  was  to  be  0.01 

The  number  3278.7  is  the  maximum  aircraft  speed  which  might  be  encount- 

ered when  computed  by 
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7 » -\/v'  + V 
a y X 


(54) 


V and  V represent  aircraft  velocity  vectors  in  an  X,  Y coordinate 
x y ' 

system,  and  Va  represents  the  aircraft  track  vector  with  maximum  magni- 
tude 3276.7.  The  maximum  error  of  0.01  corresponds  to  the  resolution  of 
the  aircraft  instruments  and  was  assumed  to  he  the  maximum  absolute 
error  which  would  occur  at  or  near  the  maximum  output  value.  The  maxi- 
mum relative  error,  MRK,  was  therefore  assumed  to  be  3.0*10  ; 


MRF.  “ 3.0*10~6  < 3.052*10''6 


0.01 

'3276.  7 


(55) 


Assumptions.  For  the  purpose  of  measuring  absolute  and  relative 

errors  which  were  estimated  to  be  on  the  order  of  10  * , the  CDC  square 

root  function  with  its  relative  error  bound  of  3.0*10  ^ (Kef  14:139) 

was  assumed  to  be  accurate  enough  to  use  as  the  standard. 

Testing  Criteria.  Each  solution  was  evaluated  using  Monte  Carlo 

techniques  as  described  in  the  previous  chapter.  6 was  chosen  to  be 

0.01  and  only  two  subintorvals  were  tested:  Jo.25,0.5oj  and  ^0.50,1.0oj. 

This  was  because  only  the  relative  error  was  being  measured.  The  rela- 

-b  -17 

tive  accuracy  required  was  10  , or  approximately  2 . Using  values  of 


17  for  k and  23  for  m (mantissa  length),  K*  is  computed  to  be  0.015625, 

(56) 

This  value  is  then  substituted  for  p in 


R*  =■  (2k  - 1)  *(2-m) 


n > -Ji*- 


ln(l-M 

w 


(57) 


to  obtain  a value  for  n,  where  n is  the  number  of  pseudo-random  samples 
required  to  test  to  a significance  level  of  1-(1  (see  Chapter  3). 

The  value  of  n is  computed  to  be  588; 
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n > L'lC'lr01)  ~ 537.2 

— - 0.015625v 

ln(l 2 ) 


(58) 


The  results  of  testing  800  random  trial  numbers  for  uniformity,  random- 
ness, and  correlation  are  shown  in  Tables  1,  2,  and  3 of  the  preceding 
chapter.  These  numbers  were  supplemented  by  10,000  other  pseudo-random 
numbers  when  constructing  the  absolute  and  relative  error  plots.  The 
extra  numbers  were  used  to  smooth  out  the  error  curves.  The  purpose  in 
smoothing  out  the  error  curves  was  to  show  more  accurately  the  nature  of 
the  maximum  error.  These  extra  numbers  were  generated  by  single  calls 
to  the  CDC  pseudo-random  number  generator.  Each  call  returns  a value 
uniformly-distributed  over  the  interval  ^O.O.l.oj,  and  by  modifying 
those  numbers  which  fall  in  the  interval  ^0.00,0.23j  by 

S «-  0.5  - S (59) 

approximately  5000  pseudo-random  numbers  were  obtained  for  each  of  the 
two  subintervals  being  tested.  All  the  Inputs  were  divided  into  100 
evenly-spaced  intervals  for  plotting  purposes.  The  plots  show  the  maxi- 
mum (positive  and  negative)  absolute  and  relative  errors  obtained  over 
each  of  the  100  plotting  intervals. 

Solution  1.  The  first  method  for  approximating  the  square  root 
function  is  to  call  the  CDC  square  root  function  and  truncate  the  result 
to  the  desired  wordlength.  If  the  computer  being  simulated  has  a man- 
tissa length  of  m bits,  then  the  effect  of  truncating  a value  such  as 
that  returned  from  the  CDC  square  root  routine  is  to  introduce  a negative 

^ | J 

relative  error  (see  equation  52)  not  less  than  -2  . For  the  AN/AYK- 

-22 

15A  flight  computer,  this  relative  error  hound  would  he  -2  ; or  approx- 
imately — 2 . 4*10  The  absolute  and  relative  error  plots  over  the  in- 

terval 1^0.00,  l.Ooj  are  shown  in  Fig.  9 and  Fig.  10,  with  each  plot 

4(> 


showing  the  maxima  and  minima  over  the  100  evenly-spaced  intervals. 

Each  break  in  the  lower  bound  occurs  where  the.  exponent  of  the  square 
root  changes. 

Using  the  CDC  square  root  function  is  easier  than  using  either  of 
the  other  two  solutions  which  follow,  since  only  one  line  of  code  is 
required  in  the  FORTRAN  flight  routine.  The  error  bounds  are  approxi- 
mately the  same  as  those  produced  by  solutions  2 and  3,  so  little  can  be 

gained  (in  terms  of  simulation  accuracy)  by  using  either  solution  2 or  3 

when  simulating. 

Solution  2.  The  second  method  for  computing  the  square  root  func- 
tion uses  a combination  of  a linear  minimax  polynomial  and  an  exponent 
shift  to  get  an  initial  approximation  (Ref  33:25).  Once  the  first 
approximation  has  been  computed,  two  iterations  of  Newton's  method  are 
applied  to  obtain  the  desired  relative  accuracy. 

The  exponent  of  the  initial  approximation  is  computed  by  right- 

shifting  one  place  the  exponent  of  the  input  argument.  If  a 1 is  shifted 

off  (odd  exponent),  then  the  shifted  exponent  is  increased  by  one  and 

the  mantissa  is  shifted  one  place  to  the  right.  A relative  error  of 

-23 

-2  can  be  introduced  if,  during  a mantissa  shift,  a 1 is  shifted  off 
the.  end.  This  mantissa  shift  then  leaves  an  unnormalized  mantissa  which 
lies  in  the  range  |o.25,0.5oj.  Otherwise  the  mantissa  lies  in  the  nor- 
mal range  |o.  50, 1.  ooj  . A linear  minimax  polynomial, 

Mq  = A*M  + B (60) 

is  used  as  the  first  approximation,  Mq,  to  the  mantissa  of  the  square 
root  of  the  input  argument  with  mantissa  M.  This  is  then  combined  with 
the  shifted  exponent  to  obtain  the  first  approximation  for  the  square 
root  of  X. 
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Since  the  error  measured  is  the  relative  error,  the  minimax  property 


(that  the  maximum  error  is  a minimum)  should  hold  for  the  relative  error 
of  the  final  output  value.  Hemker,  et  al  (Ref  33:25),  solve  for  the  two 
minimax  coefficients  A and  B,  giving  A = 0.6862915010151  and  B = 0.343 
1457505076.  These  coefficients  may  also  be  expressed  by 

A = 2*B  (61) 

B = 6-4  /2  (62) 

The  maximum  relative  error,  MKE  of  the  initial  approximation,  occurs  at 
both  endpoints  and  at  one  interior  point  and  may  be  computed  using  the 
right  endpoint; 


MRE  = max 


A*X+B-/X 

A+B-l 

/X 

1 

= 17-12  /2  = 0.02944 


(63) 


X=1 


Two  iterations  of  Newton's  method  carried  out  on  a machine  with  infinite 

,-8 


precision  would  give  a final  maximum  relative  error  less  than  9*10 
This  is  only  slightly  better  than  the  actual  relative  accuracy  obtain- 
able using  single  precision  (23-bit  mantissa)  on  the  computer  being 
simulated. 

Solution  2 was  tested  over  the  interval  Jo.25,1.0oj  with  the  num- 


bers in  the  interval  j^0.25,0.5oj  representing  those  that  would  have  had 
an  odd  exponent.  Hence,  their  accuracy  was  truncated  to  22  bits,  thereby 
simulating  the  extra  right  shift.  The  minimax  polynomial  and  the  two 
Newton  iterations  were  evaluated  using  a 23-bit  mantissa  for  all  cases. 
The  absolute  and  relative  error  plots  are  shown  in  Fig.  11  and  Fig.  12. 

The  effect  of  the  initial  minimax  approximation  for  the  mantissa 
can  still  be  observed  in  these  plots.  If  another  iteration  of  Newton's 
method  is  performed  on  the  AN/AYK-15A  computer,  this  effect  would  com- 
pletely disappear.  However,  the  maximum  (negative)  relative  error  bound 


49 


would  not  be  Improved  any. 

Solution  J) . The  third  method,  which  also  uses  a combination  of  a 
linear  minimax  polynomial  and  an  exponent  shift  to  get  an  initial  approx- 
imation, is  presented  as  an  alternative  to  the  second  solution.  The 
exponent  of  the  first  approximation  is  computed  by  right-shifting  one 
place  the  exponent  of  the  input  argument.  If  a 1 is  shifted  off  (odd 
exponent),  this  method  sets  a flag  to  signal  a later  multiplication  by  a 
pre-stored  constant  /2.  No  mantissa  shifting  is  required,  so  the  man- 
tissa always  lies  in  the  range  ^0.50,1.0C)J  . Since  the  range  for  the 
minimax  approximation  is  smaller,  the  maximum  error  using  the  linear 
polynomial  is  much  smaller.  The  number  resulting  from  the  exponent  shift 
combined  with  the  mantissa  from  the  minimax  polynomial  is  then  multiplied 
by  /2  if  the  exponent  of  the  input  argument  was  odd  (flag  set).  The 
coefficients  of  the  minimax  polynomial, 


may  be  expressed  by 


Mq  = A*M+B 


/2  B 


(1  + 4/2)2 


(60) 

(64) 

(65) 


A and  B,  rounded  to  ten  significant  digits,  have  the  values  A = 0.590162 
0671  and  B = 0.4173075996.  The  maximum  relative  error,  MRE,  still  occurs 
at  X = 1.0  and  two  other  places  and  may  be  expressed  by 

■>2V 


MRE  = max 


A*X+B-A 

— 7x — 


A+B-l 


< 0.00747 


(66) 


^ 'x=l  ^1+^2/ 

The  maximum  relative  error,  MRE,  as  shown  does  not  account  for  the  error 

introduced  by  the  extra  multiplication  by  /2  which  is  sometimes  needed. 

-23 

The  pre-stored  constant  r'2  has  a relative  error  less  than  2 , so  to 


get  the  maximum  relative  error,  MRE' , of  the  initial  approximation,  the 
two  relative  errors  must  therefore  be  added. 


MRE’  = MRE  + 2 
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< 0.00747 


[0.25. 


1.00 


(67) 

witli  the  numbers 


This  solution  was  tested  over  the  interval 
in  the  interval  j^0.25,0.5oj  representing  those  inputs  that  would  have 
had  an  odd  exponent  (and  hence  would  have  been  multiplied  by  /2) . The 
absolute  and  relative  error  plots  are  shown  in  Fig.  13  and  Fig.  14. 

This  method  has  two  distinct  advantages  over  the  method  of  Solution 
2.  First,  it  does  not  work  with  unnormalized  numbers,  and  second,  the 
initial  estimate  is  much  better  than  that  for  solution  2,  thereby  giving 
better  convergence  using  Newton's  method.  The  major  disadvantage  to 
this  solution  is  the  extra  multiplication  required  whenever  the  exponent 
of  the  input  argument  is  odd.  Both  solution  2 and  solution  3 should  be 
coded  in  an  assembly  language  to  facilitate  the  shifting  required  to  get 
the  initial  approximation. 

Recommendations.  Two  recommendations  should  be  made  concerning 
these  three  tests.  First,  when  simulating  using  the  n-bit  simulator, 
it  is  easiest  to  just  call  the  CDC  square  root  function  and  truncate  the 
answer  to  the  appropriate  accuracy.  This  also  gives  an  adequate  por- 
trayal of  the  errors  which  would  be  encountered  normally. 

Second,  when  comparing  the  methods  presented  in  solutions  2 and  3, 

the  machine  wordlength  must  be  considered.  The  accuracy  of  the  method 

presented  in  solution  2 is  limited  by  the  machine  wordlength  until  a 

mantissa  length  of  24  or  25  bits  is  reached.  For  longer  mantissa 

lengths,  no  significant  accuracy  is  gained  since  the  maximum  relative 

error  bound  using  only  two  iterations  of  Newton's  method  is  approximately 
_ g — (23  A) 

9*10  , or  2 v \ For  mantissa  lengths  longer  than  24  bits,  solution 
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0.00 


2 could  be  modified  to  add  more  iterations  of  Newton's  method.  The 
accuracy  of  the  method  presented  in  solution  3 surpasses  that  of  the 
method  presented  in  solution  2 around  the  24-bit  mantissa  length.  The 
accuracy  of  solution  3 continues  to  improve  significantly  until  a man- 
tissa length  of  approximately  32  bits  is  reached,  at  which  time  the 
dominant  error  is  no  longer  the  machine  truncation  error.  The  reason 
that  the  method  of  solution  3 is  able  to  obtain  better  accuracy  lies  in 
the  fact  that  a shorter  range  is  considered  when  using  the  minimax  first 
approximation,  thereby  giving  a much  better  result  from  the  two  Newton's 
method  iterations. 

Sine  Function 

For  the  sine  function,  several  Taylor  series,  minimax,  and  continued 
fraction  approximations  were  tested.  Each  proposed  sblution  was  executed 
using  the  n-bit  simulator  with  single  precision  (23-bit  mantissa)  being 
specified. 

Since  errors  on  the  order  of  10  ^ to  10  ^ were  expected,  the  GDC 
sine  function,  which  has  an  error  less  than  10  (Ref  14:141),  was  con- 
sidered to  be  exact  in  computing  the  absolute  and  relative  errors  of  the 
proposed  solutions.  The  formula  for  computing  the  absolute  error  AF.  is 

AE  = W - Y (68) 

and  the  formula  for  computing  the  relative  error  RE  is 

RE  = AE/Y,  (Y  1 0)  (69) 

In  these  equations,  the  variable  W represents  the  answer  obtained  using 
the  proposed  solution  and  the  variable  Y represents  the  answer  obtained 
from  the  CDC  library  sine  function. 

Specifications.  Several  conditions  for  the  sine,  function  were 
specified  by  the  F-16  Inertial  Navigations  Specification  (Ref  57)  and 
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the  F-16  Fire  Control  Specification  (Kef  23).  The  first  condition  was 
that  the  data  input  range  was  to  he  ^-2n,2nj.  The  second  condition  was 
that  the  error  was  to  he  less  than  10  . There  was  no  indication 

whether  this  error  bound  was  Cor  the  absolute  error,  the  relative  error, 
or  both,  so  it  was  assumed  to  be  the  absolute  bound  for  both.  The  third 
condition  was  that  the  sine  function  must  not  call  other  subroutines. 

Awsumpti ons . Several  conditions  which  were  assumed  to  be  advan- 
tageous are  shown  below  and  are  some  of  those  listed  by  llemker,  ot . al. 
(Kef  33:39-42): 


1) 


lira  - 


1 


XK) 


2) 


sin(X) 

X 


< 1 


(all  X) 


3)  Optimal  relative  accuracy  is  obtained 

4)  The  number  of  multiplications  used  is  minimal 

5)  The  odd  character  of  the  sine  function  is  preserved. 

Kan  go  Reduet  ions.  The  primary  reason  for  reducing  the  argument 
range  of  an  approximation  is  to  enable  approximations  to  be  used  which 
have  a small  number  of  terms  and  can  be  evaluated  quickly.  For  poly- 
nomial approximations,  larger  variable  ranges  mean  higher  order  poly- 
nomials must  be  used  to  obtain  the  desired  accuracy.  Beyond  some  point, 
however,  the  costliness  of  the  range  reductions  will  offset  the  advan- 
tages (Kef  18:39,44). 

The  first  range  reduction  for  the  sine  approximation  used  the  tvi- 
gonome trie  i den  t i t y 


sin(X)  ■ -sin(-X)  (70) 

to  reduce  the  range  to  £o,2nj  . This  range  reduction  also  guaranteed 

that  the  odd  character  of  the  sine  funct ion  would  be  preserved.  The 
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identity 


sin  (X)  = -sin(2Tr-X)  (71) 

was  used  to  reduce  the  range  further  to  ^O.nj  , and 

sin(X)  = sin(ir-X)  (72) 

was  used  to  complete  the  range  reduction,  resulting  in  a final  input 
range  of  ^0,Tr/2j  . All  range  reductions  were  accomplished  using  extended 
precision  to  reduce  the  effects  caused  by  cancellation  of  terms.  Since 
the  sine  approximation  must  not  call  other  subroutines,  the  input  range 
was  not  further  reduced  to  J0.Tr/4j  using 

sin(X)  = cos(ir/2-X)  (73) 


Testing  Criteria.  Solutions  were  evaluated  using  Monte  Carlo  tech- 
niques as  described  in  the  previous  chapter.  B was  chosen  to  he  0.05 
and  for  the  AN/AYK-15A  computer,  there  are  264  subintervals  containing 
numbers  from  the  interval  £-2ir,2nj  . Using  values  of  17  for  k (2  k 
accuracy  - see  Kef  23,57)  and  23  for  m (mantissa  length),  R*  is  computed 
to  be  0.C15625; 


R*  - (2k  - 1) (2~m)  * 0.015625 


(74) 


This  valve  is  then  substituted  for  p in 

InB 


n > 


In  (1  - ^-) 


(75) 


to  obtain  a value  for  n,  where  n is  the  number  of  pseudo-random  samples 
required  to  test  to  a significance  level  of  1-B  (see  chapter  3). 

The  value  of  n is  computed  to  be  50,615; 

ln(0. 05) 


n > 


ln(l  - 


0.015625 

264 


- 50614.4 


(76) 


Since  the  interv; 


onlv  contains  numbers  from  half  as  many  inter- 


/al  [o,2tt1 

vals  as  the  interval  J-2n,2nj  does,  actual  testing  occurred  over  the 
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The  first  range  reduction  in  the  sine  routine  uses  the 


interval  ^0,2it 
trigonometric  identity 


sin(X)  « -sin(-X)  (70) 
so  the  test  Is  just  as  significant  over  the  reduced  range.  This  allows 
w to  be  set  to  132  in  equation  (69),  and  now  n is  computed  to  be.  25307, 


. ln(0. 05)  „ / 

n - , 0.015625  n"  2530  '•  * 

1,1(1  132  ' 


(77) 


thus  giving  a 50  percent  savings. 

25307  pseudo- random  trials  were  constructed  and  the  results  of 
testing  these  numbers  Cor  a uniform  distribution,  randomness,  and  cor- 
relation are  shown  in  Tables  1,  2,  and  3 of  the  preceding  chapter. 
Usually,  when  such  large  samples  are  tested,  the  serial  correlation 
test  can  be  rewritten  to  be  more  efficient  (Kef  38:64-65)  and  the 
Kolmogorov-Smi rnov  and  Runs  tests  can  be  handled  by  establishing  a 
chained  list  in  increasing  order  of  magnitude  instead  of  using  arrays 
held  and  sorted  in  core  (Ref  19:188). 

Solution.  The  solution  presented  is  a seventh-order  minimax  approx- 
imation to  the  sine  function  witli  the  relative  error  exhibiting  the  mini- 
max property.  Once  the  argument  X lias  been  reduced  to  the  range  £o,n/2j, 
it  is  scaled  (not  reduced)  to  the  range  by  the  change  of  variables 

Y = (2/n)*X  (78) 

so  that  the  sine  function  is  now  expressed  in  terms  of  the  variable  Y. 

The  next  step  is  to  compute  7 = Y*Y,  thereby  reducing  the  number  of  mul- 
tiplications required.  Tbc  minimax  polynomial  can  then  be  expressed 
using  Horner’s  Rule  for  nested  multiplication; 

sin(7!Y/2)  = (((C  *7.  + CJ  * Z + C-)  * 7 + C.)  * Y (79) 

4 3 l 1 


The  coefficients  of  this  polynomial  were  computed  by  Hart,  et.  al. 
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(Ref  32:117-118,  237)  and  are  shown  in  the  second  column  of  Table  4. 
These  coefficients  were  rounded  to  23-hit  accuracy  and  input  in  octal 
format.  The  approximate  decimal  values  of  the  rounded  coefficients  are 
shown  in  the  third  column  of  Table  4. 


Coefficients  Computed 
by  Hart 

. — ...  — ^ 

Coefficients  Rounded 
to  23-bit  Accuracy 

C1 

1.5707948511 

1.570794820786 

C- 

-0.6459209764 

-0.6459209918976 

2 

C3 

0.0794876542 

0.07948765158653 

C4 

-0.004362469 

-0.004362468607724 

Table  4.  Minimax  Sine  Coefficients 


Absolute  and  relative  error  plots  for  the  approximation  to  the  sine 
function  on  the  reduced  interval  [o,TT/2j  are  shown  in  Fig.  15  and  Fig. 

16.  The  pattern  which  appears  over  this  interval  is  found  to  repeat  in 
various  forms  over  the  other  parts  of  the  interval  £o,2trj  for  both  sine 
and  cosine  approximations.  This  mirror  image  effect  is  caused  by  the 
range  reductions.  The  absolute  and  relative  error  plots  for  the  minimax 
approximation  to  the  sine  function  on  the  interval  ^0,2iij  (using  the 
mentioned  range  reduction  techniques)  are  shown  in  Fig.  17  and  Fig.  18. 
These  plots  can  be  compared  to  the  absolute  and  relative  error  plots 
shown  in  Fig.  19  and  Fig.  20  which  were  obtained  by  truncating  the  value 
returned  from  the  CDC  6600  library  subroutine  and  using  that  as  the 
approximation.  As  can  be  seen,  the  differences  in  the  size  of  the  ab- 
solute and  relative  errors  are  on  the  order  of  one  magnitude.  Because 
of  the  difference  in  the  size  of  the  errors,  the  seventh-order  minimax 
polynomial  was  used  when  analyzing  the  flight  routine  in  chapter  5, 


thereby  ensuring  that  generated  and  analytic  errors  would  be  accurately 
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represented . 


[ ( 


Summary . When  constructing  polynomial  approximations  to  the  sine 
function,  range  reduction  techniques  using  trigonometric  identities  are 
employed.  Once  the  argument  range  has  been  reduced,  minimax  polynomials 
can  be  used  to  minimize  the  maximum  error  (either  absolute  or  relative). 
Since  the  sine  function  can  be  used  in  both  additions  and  multiplica- 
tions, both  the  absolute  error  and  the  relative  error  must  be  considered. 
Krror  plots  can  be  used  to  aid  visually  in  determining  the  error  charac- 
teristics of  the  approximation. 

Cosine  Funct i on 

Since  the  sine  and  cosine  functions  are  so  closely  related,  (cos 
(X)  = sin(ir/2  - X)),  only  one  routine  with  two  entry  points  is  needed. 
Whenever  cos(X)  is  requested,  sin(iT/2  - X)  can  be  evaluated.  One  mini- 
max polynomial  is  used  in  the  routine,  and  the  different  entry  points 
facilitated  coding  for  argument  reduction. 

Specif i cat  ions.  The  specifications  for  the  cosine  function  are  the 
same  as  those  for  the  sine  function.  As  for  the  sine  function,  the  error 
bound  of  10  was  assumed  to  he  the  bound  for  both  absolute  and  relative 
errors. 

Assumptions . Several  conditions  which  were  assumed  to  he  advantage- 
ous are  shown  below  and  are  some  of  those  listed  by  Hemker,  et.  al.  (Kef 


33:39-42) : 


1) 


lim 


= 1 


2)  Optimal  relative  accuracy  is  obtained 


Range  ReJuct i ons.  The  first  range  reduction  for  the  cosine  approx- 
imation used  the  trigonometric  identity 

cos(X)  = cos(-X)  (80) 

to  reduce  the  range  to  £o,2irJ  . This  also  guaranteed  that  the  even  char- 
acter of  the  cosine  function  would  be  preserved.  The  identity 

cos(X)  = cos(2tt-X)  (81) 

was  used  to  reduce  further  the  range  to  |\l,iij  > and 

cos(X)  = -cos(tt-X)  (82) 

was  used  to  complete  the  range  reduction,  resulting  in  a final  input 
range  of  £o,it/2j  . Hetnker's  first  condition  for  the  cosine  function  meant 
that  a power  scries  approximation  which  is  expanded  about  zero  could  not 
be  vised  near  Tr/2,  since  the  relative  error  exceeds  10  A further  range 
reduction 

cos (X)  = sin  ( n / 2— X)  (83) 

allows  the  minimax  polynomial  for  the  sine  function  to  compute  the  cosine 
function  also.  This  has  the  benefit  of  requiring  only  one  routine  with 
two  entry  points,  one  minimax  polynomial,  and  one  set  of  coefficients  to 
compute  both  the  sine  and  cosine  functions. 

Testing  Criteria.  The  cosine  approximation  was  tested  using  the 
same  criteria  as  for  the  sine  approximations.  The  same  numbers  were 
constructed  over  the  interval  ^0,2rrj  , and  value  of  (3  was  0.05. 

Solution.  The  cosine  approximation  is  computed  with  the  same  mini- 
max polynomial  used  to  compute  the  sine  approximation.  The  absolute  and 
relative  error  plots  for  the  minimax  approximation  to  the  cosine  function 
on  the  interval  £o,2nj  (using  the  mentioned  range  reduction  techniques) 
are  shown  in  Fig.  21  and  Fig.  22.  Just  as  for  the  sine  approximation, 
the  pattern  of  the  minimax  polynomial  con  he  seen  to  repeat,  with  the 
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mirror-image  effect  being  caused  by  the  range  reductions. 

Summary.  When  constructing  polynomial  approximations  to  the  cosine 
function,  range  reduction  techniques  using  trigonometric  identities  are 
employed.  The  cosine  function  can  be  evaluated  using  the  sine  approxi- 
mation if  the  identity  cos(X)  = sin(7i/2-X)  is  used.  This  results  in  a 
savings  in  core  storage  and  also  allows  the  relative  error  of  cos(X)  to 
be  minimized  even  when  X - tt/2.  Error  plots  can  be  used  to  determine 
visually  the  error  characteristics  of  the  approximation. 

Summary 

Common  mathematical  routines  such  as  sine,  cosine,  and  square  root 
are  executed  often  and  need  to  execute  as  fast  as  possible  while  still 
providing  sufficient  accuracy.  Range  reduction  techniques  enable  lower- 
order  polynomials  to  be  used  for  approximations,  thereby  reducing  the 
number  of  multiplications  required.  Since  cancellation  of  terms  can 
occur  during  range  reductions,  range  reductions  are  performed  using 
extended  precision.  Minimax  polynomials  can  be  used  to  reduce  either 
the  maximum  relative  or  absolute  error.  Hart,  et.  al.  (Ref  32)  gives 
an  extensive  collection  of  minimax  and  near-minimax  solutions  to  many 
common  functions.  Sometimes  special  applications  dictate  that  error 
criteria  other  than  the  minimax  property  be  used.  In  these  cases,  it 
is  usually  necessary  to  know  not  only  the  range  of  the  inputs,  but  also 
the  distribution.  Approximations  can  be  tested  using  Monte  Carlo  tech- 
niques in  conjunction  with  the  n-bit  simulator.  Error  plots  showing  the 
maximum  (positive  and  negative)  errors  can  be  used  to  aid  visually  in 
determining  the  error  characteristics  of  the  approximation. 
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V Anal ys is  of  Avionics  Rout i nc 

When  analyzing  the  error  characteristics  of  any  given  processor,  it 
is  often  helpful  to  analyze  computer  programs  which  would  normally  bo 
executed  on  the  processor.  This  is  especially  important  when  trying  to 
determine  if  the  wordlength  of  the  processor  is  long  enough  to  maintain 
some  specified  accuracy  for  the  programs,  given  that  the  transmitted 
error  has  been  removed.  This  chapter  contains  a detailed  discussion  of 
one  approach  for  analyzing  the  error  characteristics  of  a computer  pro- 
gram and  the  processor  on  which  it  is  executed.  Although  only  one  rou- 
tine has  been  used  for  demonstrat ion  purposes,  the  method  presented  can 
easily  be  applied  to  other  single  or  multi-variate  functions.  Tn  order 
to  be  analyzed  using  the  method  described  in  this  chapter,  these  func- 
tions must  be  capable  of  being  coded  in  FORTRAN  and  executed  on  either  a 
COC  6600  or  CD C CYBER  74  computer. 

A forward  error  analysis  is  conducted  using  the  n-bit  simulator, 
and  error  plots  are  utilized  to  aid  visually  in  determining  the  error 
characteristics  of  the  routine  and  the  computer  on  which  it  would  nor- 
mally be.  executed.  For  the  purpose  of  demonstrating  this  method,  the 
computer  being  simulated  is  the  AN/AYK-15A  digital  processor  (Kef  2). 

The  AN/AYK-15A  uses  sine  plus  two’s  complement  representations  of  float- 
ing point  numbers.  The  processor  truncates  (as  opposed  to  rounding)  and 
there  are  no  guard  bits  used. 

Bearing  To  Co  Rout i no 

The  avionics  routine  analyzed  is  the  part  of  the.  steering  function 
called  bearing  to  go,  or  desired  track  (Refs  7 and  57).  The  steering 
function  is  exercised  bv  the  AN/AYK-.15A  during  the  navigation  mode  and 
is  used  for  way-point  navigation.  Bearing  to  go  (BTC)  is  the  angle  in 


spherical  coordinates  between  true  north  and  the  selected  way-point  sub- 
tended at  the  present  position  of  the  aircraft  (using  great  circle  navi- 
gation) . 


BTG  is  depicted  in  big.  23,  where  the  coordinates  (A, .JO  represent 
the  present  aircraft  position  and  ( ) represent  the  selected  waypoint. 


BTG  is  computed  as 


whe  re 


big.  23.  Bearing  To  Go 


— 1 V 

BTG  - tail  (---  ) 
x 


CT  « sin(AT)*cos(A)-cos(AT)*sin(A)*cos(i{' ,-<(>) 


CTy  « -cos(XT)*sin(^T-4>)  (8b) 

X and  \p  are  latitude  readings  and  may  take  on  values  in  the  interval 
£-tt/2,  n/2j  , while  4*  and  d'.j.  are  longitude  readings  and  may  take  on  values 
in  the  interval  , nj  . BTG  may  take  on  values  from  the  interval  £-tt,  7,J 

and  must  have  an  "accuracy"  of  10  '**  (Kef  37:96).  This  means  that  if  X, 

<f>,  X.j,,  and  are  exactly  representable  on  the  AN/AYK-15A  (transmitted 
error  removed),  then  the  absolute  error  (positive  or  negative)  of  BTG 
must  not  exceed  10  for  any  combination  of  X,  s*> , X^,  and  <}>  . 
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This  rout  (no  was  chosou  for  an  analysis  demonstration  tor  throo 
reasons : 

it  is  easy  to  use  Monte  Carlo  techniques  to  exercise  the  code, 
since  values  for  \,  <£,  \,^ , and  can  he  generated  using  a 
pseudo-random  number  generator  (see  chapter  1)  , 

- cosine  and  sine  functions  are  used  both  in  rault iplicat ions  and 
additions  tor  subtractions),  so  the  absolute  and  relative  error 
characteristics  ot  the  sine  and  cosine  approximations  must  be 
cons idered, 

it  is  a four-variate  function,  so  techniques  demonstrated  can 
easily  be  extended  to  other  multivariate  functions. 

Bearing  To  Co  Analysis 

Objectives.  The  objectives  of  this  demonstration  are  tv'  show  how 
the  n-bit  simulator  can  be  utilized  in  a forward  error  analysis  and  to 
show  how  error  plots  can  be  utilized  to  aid  visually  in  analyzing  the 
error  characteristics  of  a multi-variate  function.  Trror  plots  are  used 
to  help  show  the  relationships  of  each  variable  to  the  maximum  absolute 
error.  By  comparing  patterns  for  different  variables,  estimates  ean  be 
obtained  for  each  variable  which  causes  certain  large  errors  to  occur. 
These  errors  often  appear  on  the  plots  as  prominent  spikes. 

The  n-bit  simulator  is  used  in  this  forward  error  analysis  for  two 
reasons.  First,  although  speetfieat  ions  exist  for  the  AN/AYK-1SA  digital 
processor,  none  have  been  constructed  yet ; and  seeond,  vising  the  n-bit 
s iron la t or  facilitates  comparing  results,  since  data  from  the  simulator 
and  the  benchmark  data  (from  the  routine  executed  without  the  simulator! 
can  be  collected  at  the  same  time  using  a driver  module. 

Approach.  The  modules  used  in  analyzing  the  navigation  routine  BIT! 


r 

are  shown  in  Fig.  24  and  Fig.  25.  The  modules  shown  in  Fig.  24  are 
compiled  directly  (without  being  preprocessed  by  the  n-bit  simulator). 

The  driver  module,  COMPAR,  collects  the  output  from  the  subroutines  SUbl 
and  SUR2  and  computes  the  absolute  and  relative  errors.  COMPAR  also 
develops  the  data  to  be  used  in  building  the  error  plots.  The  range  for 
each  variable  is  divided  into  100  evenly-spaced  intervals,  with  each  in- 
terval containing  the  maximum  absolute  and  relative  errors  (both  positive 
and  negative).  After  the  simulation  has  finished,  the  plot  array  con- 
taining the  plot  data  is  written  to  file  TAPF.4  to  be  processed  by  a 
plotting  program.  A functional  flow  diagram  of  the  driver  module  COMPAR 
is  shown  in  Fig.  26. 

Tho  subroutine  SUB1  shown  in  Fig.  24  is  used  to  evaluate  BTC  using 

the  accuracy  obtainable  with  a 48-bit  mantissa.  The  value  of  BTC  re- 
turned is  used  in  the  forward  error  analysis  as  that  which  is  assumed  to 

be  without  error.  Although  there  is  inevitably  some  error  in  BTC,  it  is 

assumed  that  it  is  enough  (three  orders  of  magnitude)  less  than  that 
produced  using  a 23-bit  mantissa  that  it  can  be  considered  to  be  exact . 

The  subroutine  RANDM  does  not  need  to  be  preprocessed  by  the  n-bit 
simulator,  since  it  returns  values  which  are  exactly  representable  with 
a 23-bit  mantissa.  RANDM  calls  the  CDC  pseudo-random  generator  eight 
times  to  get  values  to  use  in  constructing  the  exponents  and  mantissas 
which  make  up  the  four  arguments  returned.  Since  RANDM  was  called  many 
times  (20,000),  shift  operations  were  inserted  to  truncate  the  mantissa 
to  the  required  accuracy  instead  of  preprocessing  RANDM  with  the  n-bit 
simulator.  Therefore,  the  arrays  KF.Y  and  THEY  are  made  available 
through  the  COMMON  statement. 


The  modules  shown  in  Fig.  25  are  those  which  are  preprocessed  by 


PROGRAM  COMPAR ( I NPUT , OUTPUT , TAPEl=OUTFUT , TAPE4 ) 

COMMON  KEY (8) , TREY (4) 

CALL  S ETN BIT  ( pa  mine  t e r s ) 

• 

CALL  RANDM(,\,4>,  XT,<})T) 

call  subroutine  to  evaluate  BTG  using  full  precision 
CALL  SUB1(\,4>,X  ,<J)  ,BTG) 

call  subroutine  to  evaluate  BTG  using  n-bit  simulator 
CALL  SUB2  ( X , (J> , X,f , (f»  , SBTG) 

calculate  the  absolute  and  relative  errors 
AE=SRTG-BTC, 

RE=0. 0 

IF  (BTG. NE. 0.0)  RE=AE/BTG 

STOP 

END 

SUBROUTINE  SUB  1 ( X , <j> , X , <ft  , BTG ) 

subroutine  which  uses  full  accuracy  of  CDC  CYBER  74 
calls  are  made  to  CDC  library  for  mathematical  functions 
CTX  = SIN(Xt)*C0S(X)  - C0S(Xt)*SIN(X)*C0S(<J)  -<f>) 

CTY  = -C0S(Xt)*SIN(4>t-<()) 

BTG  = AT AN 2 (- CTY, CTX) 

RETURN 

END 

SUBROUTINE  RANDM(X,  <J>,  X ,<|>T) 

subroutine  to  generate  pseudo-random  inputs 
transmitted  errors  are  removed 
COMMON  KEY ( 8) , TKEY ( 4) 

RETURN 

END 


Fig.  24.  Code  to  be  Compiled  Without  Being  Preprocossed 
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SUBROUTINE  SUB2  (X,<f<,  X ,<f>  , SBTO) 

subroutine*  which  is  processed  l>y  n-blt  simulator 
calls  arc  made  to  specially  coded  mathematical  functions 
COMMON  KEY (8) , TREY (A) 

SCTX  *=  SSIN(X,j,)  * SCOS(X)  - SCOS(X>r)  * SS1N(X)  * SCOS(<J>  -<f) 

SCTY  + - SCOS(X,r)  * SS1N  (<f  -<f>) 

SBTC  - SATAN 2(-SCTY,SCTX) 

note  - In  actual  simulation  SATAN2  was  not  used; 

ATAN2  was  substituted,  and  the  value  returned  was  truncated 
RETURN 
END 

FUNCTION  SSIN(AVAl.UK) 

specially  coded  sine  approximation  (see  previous  chapter) 
COMMON  KEY (8) ,TKEY (A) 

ENTRY  SCOS 

special  entry  point  for  cosine  approximation 


RETURN 

END 

FUNCTION  SATAN 2 (AVA1.1 , AVAL2) 

specially  coded  inverse  tangent  approximation 
COMMON  KEY (8) , TKEY(A) 


RETURN 


i 

j 


. 


the  n-bit  simulator  prior  to  being  compiled.  The  subroutine  SUB2  is 
used  to  evaluate  BTG  using  the  accuracy  which  would  be  obtainable  on  the 
AN/AYK-15A  processor.  The  sine  function  is  approximated  using  the  func- 
tion subprogram  SSIN,  with  an  entry  point  SCOS  being  used  for  cosine 
approximations.  The  polynomial  used  in  SSIN  is  the  same  as  that  dis- 
cussed in  the  previous  chapter. 

The  function  subprograms  SATAN2  is  included  in  Fig.  25  to  show 
where  it  would  go  if  it  were  used.  When  evaluating  the  navigation  rou- 
tine BTC  the  CDC  library  routine  ATAN2  was  used,  with  the  mantissa  of 
the  result  being  truncated  to  the  number  of  bits  specified  (23  bits). 
This  was  because  the  arctangent  function  was  not  analyzed  as  the  sine 
and  cosine  functions  were. 

Terminal  1 on  Cri tori a.  The  procedure  used  for  computing  n,  the  num- 
ber of  trials  to  conduct,  follows  that  discussed  in  chapter  3.  The 
equation 


ln_i.PL 


“ - In  (1-P) 

is  used  to  solve  for  n once  3 and  T have  been  determined.  B,  which  is 
the  tester's  risk  that  he  accepts  bad  software,  was  arbitrarily  speci- 
fied to  be  0.01.  By  decreasing  3,  the  tester  becomes  more  confident 
that  software  he  accepts  actually  meets  the  specifications.  However, 
decreasing  3 also  increases  the  computed  value  of  n,  the  number  of 
trials  he  must  conduct.  To  determine  P,  the  equation 


(50) 


r Pi  X pi  pi  3 

= y — - y — • -J-  + o 

i=l  Wi  i,j  Wi  Wj 


(AO) 


is  used  with  all  terms  of  order  three  or  greater  being  truncated.  V, 
the  number  of  variables,  is  four.  The  four  variables  p.  are  assumed  to 


r* 


1 4 
1 
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the  maxima  obtained.  These  plots  are  shown  in  Figs.  31,  32,  33,  and  34. 
As  can  be  seen,  the  absolute  error  maximum  (either  positive  or  negative) 
is  much  less  than  that  obtained  when  t lie  .70,000  bit  patterns  were  used 
in  conjunction  with  those  5000  numbers.  Realizing  that  the  0.0 C library 
function  ATAN2  was  substituted  for  an  n-bit  approximation  for  the  arc- 
tangent function,  one  might  he  tempted  to  accept  the  HTG  function  as 
meeting  the  accuracy  specifications  if  only  the  errorplots  using  the 
5000  trials  were  considered.  However,  it  can  be  seen  by  looking  at  the 
plots  constructed  using  the  25,000  trials  that  the  maximum  error  obtained 
is  at  least  five  times  larger,  and  that  the  BTC.  routine,  if  coded  in  the 
same  manner  as  tested,  would  fail  to  meet  the  spoci filiations.  For  the 
duration  of  this  chapter,  reference  to  error  plots  will  mean  those  con- 
structed using  the  25,000  trial  numbers  (Figs.  27,  28-,  2^,  and  30). 

Once  the  routine,  as  coded,  has  been  determined  to  fail  to  meet  the 

accuracy  specifications,  it  remains  to  be  determined  whether  a judicious 

usage  of  extended  precision  might  help.  As  can  be  seen  from  the  error 

plots,  most  of  the  values  tested  fall  within  Lite  specified  limit  of  10  . 

However,  several  large  spikes  exceed  the  limit  hv  a factor  of  2 and 

several  smaller  spikes  also  exceed  the  limit.  Error  values  of  0.005  x 
-3  -5 

10  , or  0. 5 x 10  , occur  very  regularly  and  are  to  he  expected  since 

there  is  some  error  generated  within  the  sine  and  cosine  routines.  The 
large  spikes,  however,  are  not  expected.  Each  spike  which  will  be  ana- 
lyzed is  marked  by  a letter  on  each  of  the  plots.  It  is  only  conjectured 
that  the  spikes  are  related  on  each  of  tho  four  plots  by  the  letters 
assigned  them.  Although  spikes  A and  11  can  be  clearly  identified,  since 
they  are  the  absolute  maxima  (positive  and  negative),  the  others  cannot 
bo  so  positively  identified  without  more  error  trace  information  provided 


by  the  driver  module  COMPAR.  However,  for  the  purposes  of  this  analysis, 
it  is  assumed  that  spikes  A through  F can  be  identified  on  each  plot  and 
that  the  tip  of  each  spike  with  the  same  letter  identifier  is  caused  by 
the  same  set  of  trial  numbers.  This  allows  each  spike  to  be  analyzed 
independently  of  the  other  five  identified.  In  Fig.  30,  spikes  B and  C 
occur  in  neighboring  plotting  intervals,  with  spike  B occurring  to  the 
left  of  spike  C. 

Spike  Analysis.  Since  the  BTG  function  has  subtractions,  it  might 

be  hypothesized  that  the  spikes  are  caused  by  cancellation  of  terms.  If 

so,  then  the  values  of  the  input  variables  which  cause  the  spikes  should 

clearly  show  this  once  extracted  from  the  plots.  Each  of  the  six  spikes 

identified  will  be  analyzed  separately. 

Spike  A.  Spike  A is  the  maximum  positive  error  discovered,  so 

there  is  no  problem  identifying  the  values  of  A,  A^,  <{),  and  <{>  which 

cause  spike  A to  occur.  If  the  plots  for  A and  X^  are  compared,  it  can 

be  seen  that  the  spike  actually  occurs  on  the  same  plotting  interval. 

This  says  that  A-A^,.  Likewise,  the  plots  of  <j>  and  (J)^,  can  be  compared 

with  the  like  conclusion  that  This  says  that  the  aircraft  position 

( A , 4>)  and  the  waypoint  location  (A^,^)  are  approximately  the  same  place. 

Since  <p  - <p,  sin  (<J>  -<}>)-0,  and  hence  CY  -0.  COS  (<J>  -<J;)-1,  so  CT  reduces 
■la  y -i  x 

(approximately)  to  CT^sin  (A^,)  cos  (A)  - cos  (A^)  sin  (A)(1).  Now,  CT^ 

has  taken  the  form  of  sin(A^-A),  which  is  clearly  zero.  Therefore,  the 

ratios  -CT  /CT  tends  toward  0/0,  and  tan  \-CT  /CT  ) can  be  expected  to 
y x y x 

exhibit  large  errors. 

Spike  13.  For  spike  B,  A^A^,  and  so  each  must  be  approx- 

imated to  determine  why  the  spike  occurred.  As  can  be  seen  on  the  plots, 

— 22tt  3 

A=it/5,  A^,--tt/5,  <Pc*~2~5~>  and  <*,T~257T'  ^or  t'lesc  values  and  sin 
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C ~0 » so  CY  =0.  Likewise,  cos  ~~1  * and  CT^_  is  reduced  (approx- 

imately) to 

CT^-sin  (At)  (..os  (A)  - cos  (A,^)  sin  (A)  (-1) 

CT^  now  has  the  form  of  sin  (A^+A),  which  also  goes  to  zero  since  A^--A. 
Thus,  spike  B is  also  caused  by  the  instability  of  the  argument  -CY  /CT^ 
approaching  0/0. 

Spike  C.  Spike  C exhibits  the  same  characteristics  as  spike  A, 
where  A~A,^,  and  vJ'-'cj'.j,. 

Spike  0.  Spike  D exhibits  the  same  characteristics  as  spike  A, 
where  A-A,^  and 

Spike  E.  Spike  F exhibits  the  same  characteristics  as  spike  F>, 
with  A~-A^  and  tt.  Therefore,  CT^“0  and  CT^_  approaches  the  form  sin 

(At+A)  as  approaches  ir.  Since  CT^'O,  tne  argument  -CT^/CT^ 

approaches  0/0. 

Sp ike  F.  Spike  F exhibits  the  same  characteristics  as  spike  A, 
where  A^A^,  and 
Recomme.ndat  i ons 

Those  conditions  which  cause  spikes  A and  B to  occur  are  special 
cases  which  cannot  be  expected  to  occur  very  often  in  aircraft  naviga- 
tion. The  conditions  which  cause  spike  A to  occur  (A~A^,  and  are 

similar  to  the  problem  of  trying  to  use  a magnetic  compass  when  standing 
on  the  magnetic  north  pole,  and  the  conditions  which  cause  spike  B to 
occur  (A~-A,p  and  4’=4'-f-1T)  are  similar  to  the  problem  of  finding  the 
shortest  route  to  the  true  south  pole  while  standing  on  the  true  north 
pole.  However,  these  conditions  are  discovered  when  testing  with  pseudo- 
random trial  numbers. 

To  preclude  these  special  occurrences  from  affecting  the  simulated 


results,  two  arbitrary  "circles"  with  radii  r^  and  r^  should  be  con- 
structed around  the.  point  (A, 40.  The  first  circle,  with  radius  r..,  is 
used  to  detect  the  condition  when  the  aircraft  locution  and  the  waypoint 
are  approximately  the  same.  This  condition  is  detected  by  requiring 
that  the  great  circle  distance  between  (A, 4')  and  (A  ,4>,j,)  be  greater  than 
r^.  The  second  circle  is  used  to  detect  the  condition  when  the  waypoint 
is  on  the  opposite  side  of  the  earth  from  the  aircraft.  This  condition 
is  detected  by  requiring  that  the  distance  between  (A, 40  and  (A^,^)  be 
less  than  r^.  Once  these  two  conditions  have  been  imposed  on  A^,  and  <J> 
(for  a given  A and  40  » simulations  can  be  conducted  which  more  accurately 
reflect  what  might  actually  occur  in  an  aircraft  environment. 

To  compute  the  distance  (DIS)  that  (A,^, ,4>^)  is  from  (A, 40,  the 
spherical  distance  equation 


DIS  = (R+h)  *0  (87) 

may  be  used.  R is  an  approximation  (2.0926  * 10^  feet)  for  the  radius 
of  the  earth,  h is  the  aircraft  altitude  in  feet,  and  may  be  computed 
by 


CT  * cos  (BTG)  - CT  * sin  (BTC) 

oT  = tan  — - — — ' (88) 

z 

BTG,  CT  , and  CT  are  computed  as  shown  in  equations  (84),  (85),  and 
x y 

(86)  respectively,  and  CT^  is  computed  by 

CT^  = sin(A)*sin(A^,)  + cos(A)*cos(A^)*cos(4'^-4>)  (89) 

Once  DIS  is  computed,  then  those  values  of  A,  4s  A^,,  <J>  for  which  DlS<r^ 
or  DIS>r2  may  be  discarded.  Since  BTG  must  be  computed  to  get  DIS, 
different  error  plots  can  easily  be  generated  using  different  values  of 

r^  and 


It  is  conjectured  that  the  BTG  routine  would  actually  meet  the 
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accuracy  specifications  once  distance  constraints  have  been  levied 


against  A^  and  <{>r^.  This  conjecture  is  based  solely  on  the  width  of  the 
general  small  error  band  centered  about  zero  and  was  not  actually  tested. 
In  order  to  retest  the  BTG  routine  incorporating  the  distance  constraints, 
those  cases  where  (A,$)  and  (A^,  (})rj,)  fail  to  meet  the  distance  criteria 
must  be  discarded.  One  method  for  testing  using  the  distance  constraints 
would  be  to  pick  a value  of  (A,$),  and  then  generate  (A^,^)  coordinates 
until  a set  of  coordinates  is  found  which  meets  the  distance  criteria. 

If  it  is  assumed  that  the  proportion  of  the  earth's  surface  which  meets 
the  distance  criteria  for  any  given  point  is  E,  then  N,  the  number  of 
points  (A^jiJ’.p  which  one  might  expect  to  generate  for  each  point  (A,<J>)  , 
may  be  computed  by 

N = 1/ (2E)  ■ (90) 

To  test  the  BTG  routine,  n points  ( A , <J>)  would  be  generated,  where  n is 
computed  using  equation  (50).  Also,  n/(2E)  points  ( A , would  be  gen- 
erated, with  the  distance  formula  being  computed  for  each  one.  Since 
the  purpose  of  this  investigation  was  to  demonstrate  a technique  for 
analyzing  flight  routines  as  opposed  to  actually  conducting  a thorough 
analysis,  the  routine  BTG  was  not  tested  using  the  distance  criteria. 
Summary 

When  conducting  a forward  error  analysis  utilizing  the  n-bit  simu- 
lator, a driver  module  can  be  used  to  facilitate  error  data  gathering. 

Two  subroutines  can  be  used,  with  one  being  executed  using  the  full 
accuracy  obtainable  on  the  GDC  CYBER  74  and  the  other  being  reprocessed 
along  with  any  mathematical  function  approximations  needed.  The  object 
decks  for  the  n-bit  preprocessed  subroutine  and  mathematical  functions, 
the  n-bit  subroutines  (see  chapter  2),  and  the  driver  module  and  regular 
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function  are  combined  at  load  time  before  execution. 


Error  characteristics  can  be  studied  using  plots  which  show  the 
error  maxima  (positive  and  negative).  When  simulating,  it  becomes  impor- 
tant to  consider  only  those  values  of  variables  which  can  actually  occur 
in  an  operational  environment.  When  values  are  arbitrarily  chosen  from 
over  the  entire  range  of  each  variable,  errors  can  sometimes  occur  which 
affect  the  results  of  the  simulation.  Kor  this  reason,  additional  con- 
straints must  be  placed  on  the  pseudo- random  inputs  used.  The  use  of 
error  plots  can  sometimes  point  out  these  situations,  since  relationships 
which  exist  between  variables  can  also  be  observed  in  addition  to  the 
maximum  errors. 


VI  Quasi  1 i no a r i /a t i on  Method 

When  numerical  software  routines  are  analyzed  by  Monte  Carlo  tech- 
niques, the  tester  is  faced  with  determining  the  number  of  random  sam- 
ples to  use  to  ensure  that  his  risk  is  less  than  some  desired  value.  No 
matter  what  technique  is  employed  to  compute  the  number  of  samples  re- 
quired, the  tester  is  faced  with  making  assumptions  about  the  error 
characteristics  of  the  software. 

For  any  set  of  input  values  for  a software  routine,  the  error 
(either  relative  or  absolute)  can  be  uniquely  determined  given  the  char- 
acteristics of  tne  executing  computer.  In  the  following  discussion  the 
term  error  is  considered  to  mean  the  absolute  error  in  a function. 
Relative  errors  could  also  be  used,  since  they  are  obtained  by  dividing 
the  absolute  error  by  the  true  value  of  the  function  (provided  the  true 
value  of  the  function  is  not  zero).  Since  the  error  is  uniquely  deter- 
mined, it  can  be  considered  to  be  a discrete  function  of  the  implementa- 
tion of  some  algorithm  (A),  the  characteristics  of  the  machine  (M) , and 
the  independent  variables  (v^). 

ER  = f(A,M,v1,v2,...,vn)  (91) 

For  the  BTG  routine,  the  variables  v^,  i = 1,2, 3, 4,  would  represent  the 
input  variables  A,4>,A^,,  and  t}'^.  The  variables  v^  can  be  considered  to 
form  a vector  Q,  and  for  the  purposes  of  this  discussion,  A and  M rep- 
resent a given  algorithm  implemented  on  a given  computer.  Since  A and  M 
are  specified,  F.R  will  be  written  as  f (Q) . The  points  (Q.,f(Q.))  are 
considered  to  be  a subset  of  an  (n+1)  - dimensional  vector  space. 

The  quasilinearization  method  (Ref  67)  as  proposed  in  this  chapter 
is  a subopt imal  search  technique  for  finding  local  maxima  of  the  function 
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ER.  Local  maxima  which  are  discovered  are  entered  in  a population  to 
be  used  in  determining  the  stopping  criterion.  In  the  quasilineariza- 
tion method,  the  number  of  individual  tests  required  is  not  based  on 
some  hypothesized  percentage  of  successes  caused  by  each  variable,  but 
rather  on  a statistical  analysis  of  the  results  of  each  test.  Each  test 
is  a more  lengthy  process  than  a simple  evaluation  of  a set  of  random 
inputs,  however.  In  this  chapter,  the  general  quasilinearization  method 
is  discussed  as  it  might  be  applied  to  software  testing.  Also  discussed 
is  a way  in  which  test  results  can  be  analyzed  to  arrive  at  a stopping 
criterion. 

Proposed  Method 

The  quasilinearization  method  uses  a suboptimum  gradient  approach 
in  an  attempt  to  extremize  the  function  ER.  If  the  tester  knows  the 
extremum  of  ER  has  been  obtained,  he  can  say  with  certainty  whether  or 
not  the  routine  being  tested  ever  exceeds  the  error  bounds  specified. 
Since  this  method  is  a suboptimal  procedure,  however,  the  tester  seldom, 
if  ever,  knows  when  or  if  the  extremum  has  beer  obtained. 

Gradients,  when  used  in  calculus  to  extremize  multivariate  func- 
tions, generally  required  that  the  function  be  continuous  and  that  the 
first  derivative  exist  (once-differentiable)  (Ref  8:349-404).  The  func- 
tion ER,  however,  is  strictly  a discrete-valued  function  which  is  non- 
differentiable. 

It  is  assumed  that  the  values  of  the  function  ER  often  lie  within 
well-defined  limits  as  is  shown  in  Fig.  35.  The  two  lines  represent 
approximations  of  the  upper  and  lower  bounds  to  the  absolute  error  of 
the  sever.th-order  minimax  approximation  to  the  sine  function  executed 
on  the  AN/AYK-15A  computer.  The  upper  bound  as  drawn  will  be  referred 
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to  as  a continuous  and  once  differentiable  function,  since  a continuous 
and  once-di f ferent table  upper  bound  can  be  shown  to  exist  (although  not 
necessarily  the  one  drawn).  If  the  equation  of  the  upper  bound  were 
known,  then  vector  analysis  and  gradients  could  he  applied  to  extremfze 
the  function.  This  equation  is  not  known,  however.  It  is  assumed  that 
all  that  is  known  or  can  be  discovered  are  various  values  of  KR  which 
lie  somewhere  between  the  upper  and  lower  bounds.  Kor  two  values  of  KR 
(f(Qj)  and  f(<).,))  which  lie  arbitrarily  close  to  the  upper  bound  and 
which  are  a "small"  distance  apart,  a straight  line,  or  linear  spline, 
connecting  the  two  points  (Q|,f(Qj))  and  (t>.T , f (Q.,) ) can  be  obtained. 

This  spline  provides  a linear  approximation  to  the  upper  bound  and  the 
slope  of  the  spline  is  used  as  a directional  difference  approximation 
for  the  gradients  at  the  two  points.  'IVo  values  Qj  and  Qj?  can  be  found 
(with  some  difficulty)  for  which  f(Q^)  and  f (Q^)  lit*  arbitrarily  close 
to  the  upper  bound.  The  method  of  obtaining  the  values  Q^  and  will 
be  explained  later,  lly  using  the  spline  and  its  slope,  normal  methods 
for  extremiz ing  continuous  and  once- d l f ferent  I able  functions  can  be 
applied.  The  method  discussed  is  a linear  exploration  in  which  searches 
are  conducted  only  in  directions  parallel  to  one  axis  and  perpendicular 
to  all  other  axes.  Several  other  methods  are  discussed  by  Beveridge 
and  Schechter  (Ref  4)  and  by  Wilde  (Kef  64).  Before  the  algorithm  Is 
presented,  the  concepts  of  moving  to  the  surface,  obtaining  a spline, 
and  jumping  are  discussed. 

Moving  to  the  Surface.  The  process  of  determining  a point  (Qj, 
f (Qj) ) for  which  the  function  f(Qj)  lies  arbitrarily  close  to  the  upper 
bound  will  be  referred  to  ns  "moving  to  the  surface".  For  any  sot  of 


ariables  V| , v., , . . . , v making  up  Q^,  the  error  function  KK,  or  f (Q  j ) l* 


known  to  Ho  somewhere  betwo.on  the  upper  and  lower  hounds.  Each  vari- 
able Is  then  perturbed,  one  at  a time,  by  the  smallest  amount  possible 
m times  in  both  tin-  positive  and  negative  directions  (with  the  other 
variables  remaining  fixed)  and  I'l\  is  evaluated  for  each  perturbation. 

Tl\e  smallest  amount  possible  means  that  the  new  value  is  the  number 
which,  on  the  float  ing-point  number  Hue  for  the  given  machine,  lies 
adjacent  to  the  original  number.  The  value  of  m^  is  arbitrary  depending 
on  the  effort  one  is  willing  to  expend.  As  more  effort  is  expended, 
more  perturbations  are  evaluated,  thus  raising  the  tester's  confidence 
that  he  lias  obtained  a point  for  which  the  error  is  within  a specified 
distance  of  the  surface.  The  maximum  value  of  EK  obtained  from  all 
these  evaluations  (at  some  point  (Qj,f(Q.)))  is  assumed  to  He  suffic- 
ient ly  close  to  ttie  upper  bound,  and  one  lias  thus  "moved  to  the  surface". 

Obtaining  the  SplJjie.  One  of  the  most  difficult  parts  of  the 
quasi  linear! zat ion  method  is  to  find  two  values  and  which  are 
close  together  (using  Euclidean  distance  measures  for  vectors)  and  for 
which  f(Qj)  and  f (Q^)  lie  arbitrarily  close  to  the  upper  bound,  since 
the  upper  bound  is  never  known.  The  success  of  the  process  described 
here  is  related  to  the  effort  expended,  just  as  in  the  case  of  moving 
to  the  surface. 

Once  a point  (Q^,f(Q^))  has  been  obtained,  another  point 
must  bo  obtained  in  order  to  construct  n linear  spline,  since  linear 
splines  are  uniquely  determined  by  two  points.  To  get  (Q^ ,f (Q0)) , the 
process  of  moving  to  the  surface  is  repeated  at  point  (Q^,f(Qj)),  with 
,n2  (Jc2)  perturbations  being  used  in  each  axis  direction.  The  point 
<Q2,f(Q2>)  must  be  different  from  (Qpf(Qj))  and  f(Q^)  must  be  greater 
than  or  equal  to  f(Q^). 
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If  a point  (Q^ • f C'^2^ ^ can  be  found  which  satisfies  these  conditions, 
then  a linear  spline  can  ho  constructed  between  the  points  (Qj,f(Qj)) 
n,‘d  If  dCQj.l^)  represents  the  Euclidean  distance  between 

and  t)^,  then 

f(Q.,)  - f(Q.) 

-dfg^)—  ■ sWe«2>  w> 

is  considered  to  be  the  slope  of  the  spline  and  is  used  as  a directional 
difference  approximation  to  the  gradient  at  point  (Q^,f(Qj)). 

If  no  point  can  he  found,  then  the  point  (Q  ,f(Q .))  is  not 

assumed  to  be  a true  local  maximum.  The  reason  for  this  is  that  tin' 
point  (Qj,f(Q,))  may  not  actually  he  a true  local  maximum,  since  the 
existence  of  sharp  ridge  lines  can  produce  false  maxima  using  linear 
exploration  techniques  (Ref  64:65-68).  For  this  reason,  whenever  a 
local  maximum  has  been  discovered  (whether  true  or  false),  the  linear 
exploration  can  be  expanded  to  search  along  directions  not  previously 
searched  using  axis  rotations.  Although  this  does  not  guarantee  that 
the  final  local  maximum  obtained  is  not  a false  one,  it  does  provide  an 
improvement  to  the  basic  linear  exploration  in  that  problems  with  ridge 
linos  will  be  reduced  (but  still  not  eliminated).  The  treatment  of 
local  maxima  will  be  discussed  later. 

Jumping.  Once  two  points  (Qj  , f (Q ^ ) ) and  (Q0,f(Q.)))  have  been  found 
and  the  linear  spline  has  been  constructed,  a jump  is  made  to  find  a new 
point  which,  after  moving  to  the  surface  to  (Q^,f(Q^)),  may  or  may  not 
produce  a monotone  nondecrcasing  sequence  of  values  of  F.R.  The  jump  is 
made  based  on  the  differences  in  the  values  of  one  of  the  v ^ variables 
(the  one  which  is  different  between  Q and  Q.,)  and  the  slope  of  the 
spline  (the  steeper  t lie  slope,  the  samller  the  Jump).  This  jump  is 
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shown  for  a two-dimensional  case  In  Fig.  30.  The  value  “Jumped  to"  Is 
shown  as  Y,  and  Is  obtained  after  moving  to  the  surface  from  location 
Y. 


Fig.  3b.  Quasi linearizat Ion  Jump 


If  f(Q.j)  Is  greater  than  or  equal  to  f(Q,3,  then  t lie  jump  has  been  suc- 
cessful, and  the  point  (Q^,f(Q^))  becomes  the  new  point  (Q^,f(Q^)),  with 
the  process  of  obtaining  a spline  and  jumping  starting  again. 

If  f (Q  j)  is  less  than  f(Q.,3,  then  the  Jump  has  not  been  successful 
and  a new  Jump  must  be  made.  The  length  of  the  jump  is  cut  in  halt,  and 
the  jump  is  repeated,  followed  by  the  move  to  the  surface  to  establish 
(Qj,  f (Q-j)) . If*  after  some  arbitrary  number  of  Jumps,  no  value  Qj  can 
be  found  for  which  f (Q^)  is  greater  than  or  equal  to  f (Q0) , then  the 
point  (Q,,f(Q.,l)  can  become  the  new  point  (Q^»f(Qj)),  with  the  process 
of  obtaining  the  spline  and  jumping  starting  again.  This  assures  the 
existence  of  a monotone  nondecreasing  sequence  of  points  (Qj,f(Qj)) 
based  on  the  value  of  i(Qj).  The  process  of  obtaining  a spline  and 
jumping  terminates  when  a spline  cannot  he  obtained,  and  the  point 
(Qj , f (Qj))  is  then  assumed  to  be  a local  maximum.  As  mentioned  pre- 
viously, it  is  possible  for  (Q^,f(Qj))  to  be  a false  local  maximum.  A 
flow  diagram  for  jumping  is  shown  in  Fig.  37. 
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Fig.  37.  Quasi linearization  Jump  Flow  Diagram 


Algorithm.  Each  test  consists  of  using  Monte  Carlo  techniques  to 


obtain  an  initial  starting  point.  Each  input  variable  v comprising 
the  vector  Q is  generated  using  the  entire  variable  range  to  construct 
uniformly-distributed  pseudo-random  values,  as  opposed  t o the  techniques 
discussed  in  chapter  3.  From  this  point  Q,  the  process  of  moving  to  the 
surface,  obtaining  the  spline,  and  jumping  is  repeated  until  a local 
maximum  is  obtained.  This  local  maximum  is  then  entered  as  one  value  in 
a population  which  is  statistically  analyzed  to  determine  the  stopping 
point.  As  each  local  maximum  is  entered  in  order  In  the  population,  it 
is  identified  as  being  either  an  "old"  sample,  meaning  that  that  parti- 
cular value  (or  one  arbitrarily  elose  to  it)  was  previously  obtained  as 
a local  maximum,  or  else  it  Is  entered  as  a "new"  sample. 

Analyses  of  t'opul at  ion  of  Maxima.  It  is  assumed  that  there  are  N 
local  maxima  which  can  theoretically  be  found  and  each  has  an  equally 
likely  chance  of  occurring.  The  problem  of  analyzing  the  population  of 
"new"  and  "old"  samples  to  estimate  the  stopping  point  for  the  algorithm 
thus  becomes  one  of  estimating  the  value  of  N.  it  is  also  assumed  that, 
of  t he  N local  maxima  which  can  be  found,  one  is  larger  than  the  rest 
and  is  therefore  identified  as  the  extremum  of  the  function  l'K  over  the 
ranges  of  the  inputs. 

If  the  tester  has  not  found  a success  (value  of  f (Q)  which  exceeds 
some  specified  error  bound)  after  ?.  "new"  samples  have  been  entered  in 
the  population,  the  tester  would  at  least  like  to  have  a confidence  of 
1-ct  that  the  largest  local  maximum  found  is  the  extremum.  For  each  of 
the  N local  maxima,  the  probability  of  it  being  the  extremum  is  1/N. 
After  7.  new  local  maxima  have  been  discovered,  there  are  N- ?.  remaining 
to  be  discovered.  The  probability  f,  that  none  of  the  N-z  undiscovered 
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the 


§ > 1-cx  (94) 

Solving  for  z in  terms  of  N yields 

z > N(l-a)  (95) 

Thus,  the  problem  becomes  one  of  estimating  N given  the  sequence  of 
"old"  and  "new"  samples  entered  in  the  sample  population  of  local  maxima 
Whenever  a "new"  sample  is  entered  in  the  population,  the  cumula- 
tive "time"  to  that  point  (one  time  quantum  = one  trial)  is  also  entered. 
It  is  assumed  that  the  sample  points  come  from  a continuous  distribution, 
even  though  the  cumulative  number  of  trials  is  discrete.  This  allows  an 
empirical  distribution  to  he  constructed.  For  the  sample  sequence 
nnonoonooonoooooon,  the  empirical  distribution  looks  like  that  shown  in 
Fig.  38.  If  the  type  of  distribution  and  the  associated  parameters  can 

A 

be  determined,  then  N can  be  obtained  as  an  estimate  for  N. 

When  determining  the  type  of  distribution  which  best  fits  the  sam- 
ple data  points,  many  distributions  should  be  tried.  "Best"  fits  can 
be  obtained  using  either  the  Legendre  least  squares  method,  the  Gregory- 
Newton  method,  the  mean  average  method,  or  the  method  of  weighted 
residuals.  In  each  case,  the  parameters  of  the  distributions  must  be 
obtained  analytically  from  the  data.  Once  the  distributions  have  been 
fitted  to  the  data,  the  Kolmogorov-Smi mov  goodness-of-f it  test  can  bo 
used  to  eliminate  those  distributions  which  do  not  fit.  the  sample  data. 
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Cumulative  Time  to  New  Sample  Occurrence 


The  likelihood  ratio  test  can  then  be  used  to  determine  which  of  the 
remaining  distributions  provides  the  best  fit.  Using  this  distribution, 

A A 

an  estimate  N for  N can  be  obtained.  N can  then  be  substituted  for  N in 

z>N(l-ot)  (95) 

thereby  giving  the  tester  the  number  of  new  samples  he  must  obtain  to 
have  a confidence  of  1-a  that  the  extremum  has  been  encountered. 

Obtaining  a "reliable"  estimate  N for  N can  require  a great  deal  of 
computation,  so  it  is  not  recommended  that  it  be  done  after  each  sample 
is  entered  in  the  population  of  local  maxima.  Rather,  it  can  be  done 
after  a certain  pattern  of  "old"  and  "new"  samples  occurs.  Such  a pat- 
tern might  be  a "new",  followed  by  at  least  ten  "olds",  followed  by 
another  "new". 

If  the  problem  is  considered  to  be  one  strictly  dealing  with  dis- 
crete-valued functions,  then  the  problem  is  analogous  to  one  of  drawing 
balls  from  an  urn.  The  objective  is  to  find  an  estimate  for  the  number 
of  balls  in  the  urn  using  random  sampling  with  replacement.  There  are 
N balls  in  the  urn  (N  unknown)  and  initially  all  the  balls  are  white. 

A ball  is  drawn  at  random,  with  the  order  of  the  draw  and  the  color  of 
the  ball  being  recorded.  The  ball  is  then  painted  black  (regardless  of 
its  color)  and  replaced  in  the  urn.  This  process  of  drawing  balls  one 
at  a time,  recording  their  color  and  the  sample  order,  painting  them, 
and  then  returning  them  to  the  urn  is  repeated  until,  with  some  level 
of  confidence  1-a  (a  different  from  a),  it  can  be  determined  that 

N(l-e)  < N < N(l+e)  (96) 

where  N is  an  estimate  for  N.  The  frequency  distribution  covering  the 
pattern  of  random  samples  is  unknown.  Using  Monte  Carlo  techniques,  a 
good  approximation  to  the  distribution  can  be  obtained  empirically.  To 


obtain  the  estimated  frequency  distribution,  a population  with  a known 
number  N of  white  balls  is  constructed.  Samples  are  drawn  randomly 
using  the  painting  and  replacement  criteria  until  a specified  number  of 
white  balls  have  been  drawn.  Sampling  is  then  terminated  and  the  dis- 

A 

tribution  estimated.  based  on  the  empirical  distribution,  N can  b<_  ob- 
tained as  an  estimate  for  N,  which  is  known.  The  process  can  then  be 
repeated  using  different  size  populations  to  verify  that  the  method  of 

/V  /A 

determining  N still  produces  a good  approximation  to  N.  Once  N can  be 
determined  with  a confidence  of  1-a,  then  the  calculations  for  obtaining 

/A 

N can  be  applied  to  the  sequence  of  "new"  and  "old"  samples  in  the  pop- 
ulation of  local  maxima  to  determine  an  estimate  of  the  total  number  of 
"new"  samples  which  might  occur.  Then,  using 

z > K(l-a)  (97) 

the  stopping  criterion  of  finding  at  least  z "new"  samples  can  be 
determined. 

Summary . The  quasilinearization  method  employs  a suboptimal  search 
technique  in  an  attempt  to  locate  the  local  maxima  of  the  error  func- 
tion ER.  Several  important  assumptions  were  made  in  developing  the 
quasilinearization  method: 

an  upper  bound  exists  for  the  function  ER, 

this  upper  bound  can  be  approximated  by  constructing  linear 
splines, 

two  points  can  always  be  found  which  lie  arbitrarily  close  to 
the  surface  (upper  bound), 

there  are  a finite  number  of  local  maxima  which  can  be  found, 
each  local  maximum  has  an  equally  likely  chance  of  occurring, 
and 


97 


- one  local  maximum  is  larger  than  the  rest  and  Is  the  extremum. 

Local  maxima  are  found  by  starting  at  random  locations  and  then 
alternately  moving  to  the  surface,  obtaining  splines,  and  jumping  until 
splines  can  no  longer  be  obtained.  Each  local  maximum  thus  obtained  is 
entered  into  a population  of  other  local  maxima  and  is  marked  as  being 
either  a "new"  maximum  or  an  "old"  one.  Whenever  a specified  pattern 
of  "new"  and  "old"  samples  have  occurred,  the  population  of  local  maxima 
is  statistically  analyzed  to  obtain  an  estimate  of  the  total  number  of 
"new"  local  maxima  which  can  theoretically  be  found.  This  estimate, 
when  multiplied  by  1-a,  gives  the  tester  the  number  of  "new"  local 
maxima  to  obtain  before  stopping.  If,  at  any  time,  a value  f(Q)  is 
obtained  which  exceeds  the  specified  error  bounds,  the  process  stops 
and  the  tester  is  able  to  reject  the  routine  being  analyzed. 


VII  Colic  Itis I ons  .util  Kv-i  ommciid.tf  i oils 


Thi*  purpose  til  Mils  Investigation  Is  In  develop  tools  and  technique 
which  can  lie  list'd  to  deteimlne  It  a given  computer  can  stilve  a given 
avionics  signal  pt  (trussing  problem  within  certain  specified  error  and 
time  tolerances.  Specifically,  the  following  goals  were  defined: 

develop  a tool  to  simulate  accurately  the  computational  charac- 
teristics of  any  digital  processor 

produce  common  library  routines  which  are  optimal  in  the  sense 
that  they  try  to  maximize  both  absolute  and  relative  aeeuvacy 
and  at  the  same  time  minimize  the  number  of  Instructions 
(especially  multiplications)  required 

demonstrate  the  effectiveness  of  the  simulation  tool  mentioned 
above  to  analyze  the  error  characteristics  of  a given  class  of 
algorithms  and  associated  routines. 

Knelt  of  these  three  goals  were  met  with  one  exception.  Only  t ou- 
tlnes  for  the  square  root,  sine,  and  cosine  functions  were  developed 
and  analyzed.  Avionics  algorithms  also  incorporate  the  arctangent 
function,  which  was  not  analyzed. 

Colic  I us  i ons 

The  n-hit  simulator  can  be  used  effectively  in  a forward  crier 
analysis.  Several  enhancements  were  made  to  the  n-hit  simulator  to 
simulate  more  accurately  the  numerical  effect.'!  caused  by  linitc  word- 
lengths  of  various  computers.  Numerical  routines  can  bo  evaluated  by 
first  coding  them  in  FORTRAN  as  two  Identical  subroutines  controlled  by 
a driver  module.  The  first  subroutine  uses  CHC  library  routines  and, 


together  with  the  driver,  is  compiled  without  being  proproeossod  by  the 
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n-blt  simulator.  Tho  socotul  sub  rout  1 no  and  tlio  associated  function  sub- 
programs arc  pi oprooossoi!  by  the  n-blt  simulator  before  being  compiled. 
The  driver  moduli1  determines  the  error  characteristics  by  comparing  tho 
values  returned  from  tho  two  subroutines. 

A simulation  analysis  bused  on  Monte  Carlo  techniques  requires  that 
sequences  of  events  ho  generated  where  each  sequence  obeys  a probability 
law  governing  a particular  component  of  the  random  behavior  in  question. 
One  law  commonly  encountered  in  simulation  studies  assumes  that  events 
in  a sequence  are  independent  and  idoutioallr  distributed  (Kef  19:1(>7). 
Therefore,  when  pseudo-random  numbers  arc  utilized,  they  should  he  tested 
for  randomness,  correlation,  and  goodness  of  tit  to  a specified  disti i- 
but ion. 

Since  floating-point  numbers  are  not  uniformly  dense  on  the  entire 
number  line,  but  only  over  short  suhintei vals,  trial  numbers  are  con- 
structed which  provide  a pseudo-random  sampling  of  the  representable 
numbers,  with  each  number  in  the  interval  of  consideration  being  equally— 
likely  to  be  drawn.  Kxt  ra  trial  numbers  can  also  be  constructed  to  test 
the  error  characteristics  near  the  interval  boundaries.  Krror  plots 
which  plot  maximum  errors  in  short  plotting  intervals  against  each  input 
variable  can  also  be  utilized  to  aid  visually  in  determining  error  char- 
acter 1st  ics. 

When  testing  avionics  algorithms,  special  mathematical  approxima- 
tions for  sine,  cosine,  and  arctangent  should  be  used  to  reflect  more 
accurately  the  error  characteristics  of  the  software  as  If  it  were 
actually  executed  on  the  computet  being  simulated.  Minima*  polynomial 
approximations  can  be  used  to  minimize  the  maximum  error  (either  absol- 
ute or  relat  ive) . 
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Only  one  avionics  routine  (Bearing  To  Go)  was  analyzed.  Since  the 
Bearing  To  Go  routine  is  a four- variate  function,  techniques  demonstrated 
with  this  routine  can  easily  he  extended  to  other  multivariate  function 
routines.  Error  plots  clearly  show  that  the  specified  error  hounds  were 
exceeded.  At  least  some  of  the  input  values  which  cause  these  error 
specifications  to  he  exceeded  might  he  considered  as  never  occurring  in 
an  operational  environment.  The  routine,  Bearing  To  Go,  was  not  evalu- 
ated using  only  those  inputs  which  might  he  expected  to  occur  in  an 
operat  tonal  cttvi  ronraent. 

The  quasi  1 incarizat  ion  method  as  proposed  is  an  heuristic  approach 
to  finding  local  maxima.  Since  the  method  uses  searches  which  are  con- 
ducted only  in  directions  parallel  to  the  variable  axes,  the  maximum 
value  of  the  error  function  found  on  any  one  test  may  not  be  a true  local 
maximum.  Sharp  ridge  lines  can  prevent  the  proposed  search  technique 
from  finding  true  local  maxima.  Each  test  consists  of  first  determining 
a random  starting  point  and  then  performing  a series  of  events  called 
moving  to  the  surface,  obtaining  a spline,  and  jumping  to  produce  a 
monotone  nondecreasing  sequence  of  values  of  the  error  (either  absolute 
or  relative)  function.  The  maximum  value  obtained  on  each  tost  and  its 
location  with  respect  to  the  input  variables  is  entered  in  a population 
to  be  statistically  analyzed  to  determine  the  stopping  criteria.  Test- 
ing stops  when  the  tester  reaches  a desired  level  of  confidence  that 
the  largest  local  maximum  discovered  is  the  extremum,  or  when  the  error 
limitations  are  exceeded. 

Re  commend  at  i oils 

There  are  four  different  areas  which  are  recommended  for  further 
development.  The  n-blt  simulator  needs  four  enhancements  to  make  it 
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more  flexible  for  use  in  forward  error  analyses  and  perturbation  analy- 
ses. The  special  purpose  mathematical  routines  used  in  avionics  pro- 
grams merit  more  study  (especially  the  arctangent  function  approxima- 
tion). The  specifications  (Ref  57)  for  the  avionics  algorithms  should 
be  more  explicit  in  defining  accuracy,  and  once  the  limits  on  the  input 
variables  have  been  more  clearly  defined,  more  study  needs  to  be  done  on 
the  accuracy  attainable  in  the  AN/AYK-15A  processor.  Extensive  work  is 
needed  to  develop  rigorously  the  modified  quasilinearization  method, 
with  particular  emphasis  on  attaining  and  then  verifying  the  presence  of 
local  maxima  and  also  on  the  movement,  or  jumping,  techniques. 

N-b:it  Simulator.  The  first  two  enhancements  to  be  considered  for 
the  n-bit  simulator  are  those  recommended  by  Klein  (Kef  37).  The  first 
recommendation  is  to  provide  more  flexibility  in  handling  overflow,  and 
the  second  recommendation  is  to  modify  the  n-bit  preprocessor  to  sub- 
stitute in-line  code  instead  of  subroutine  calls,  thereby  improving  the 
execution  time  of  the  n-bit  simulated  program. 

The  third  enhancement  is  to  add  to  the  existing  n-bit  simulator 
subroutines  another  subroutine  which  will  generate  pseudo-random  numbers 
which  represent  a random  sampling  of  all  the  floating-point  numbers 
representable  in  a given  range,  and  therefore  are  uniformly-distributed 
equally  over  each  subinterval  which  contains  numbers  with  the  same  expo- 
nent. 

The  fourth  recommendation  is  to  add  to  the  n-bit  simulator  subrou- 
tines a subroutine  which  returns  floating-point  numbers  adjacent  to  one. 
which  is  input.  Inputs  to  this  subroutine  might  be  a starting  number, 
direction  to  go  (positive  or  negative),  and  number  of  values  to  return. 
Outputs  from  this  subroutine  would  include  the  values  requested  and  a 
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status  flag  indicating  either  no  error  or  overflow  (either  positive  or 
negative).  The  presence  of  zero  as  a value  returned  can  also  be  indi- 
cated. 

Matheiuat leal  Routines.  All  the  special-purpose  mathematical  rou- 
tines investigated  used  extended-precision  to  perform  argument  reduc- 
tions and  single-precision  to  perform  the  polynomial  approximations  (in 
the  case  of  the  trigonometric  approximations)  or  Newton  iterations  (in 
the  case  of  the  square  root  approximations).  The  trigonometric  approxi- 
mations, and  especially  the  arctangent  function,  merit  further  study 
using  extended  precision  and/or  higher-order  polynomials  for  use  in  an 
avionics  environment. 

Avion i.cs  Routines.  Error  tolerance  limits  need  to  be  defined  more 
clearly  in  the  specifications.  Errors  were  found  which  exceeded  the 
limits  in  the  specifications,  and  although  the  inputs  used  clearly  fall 
within  the  variable  ranges  defined,  it  is  highly  unlikely  that  these 
values  would  occur  in  an  operational  environment.  Without  specifications 
which  also  show  the  limitations  placed  on  the  relationships  between 
variables,  correct  analyses  cannot  be  performed.  One  example  of  this 
would  be  specifying  certain  combinations  of  inputs  which  do  not  need  to 
meet  the  normal  error  specifications.  For  the  Bearing  To  Go  routine, 
this  might  mean  that  waypoints  within  a 1-mile  radius  or  beyond  a 3000- 
mile  radius  of  the  aircraft  would  not  be  subject  to  the  error  bounds 
specified  for  those  waypoints  which  lie  between  1 and  3000  miles  from 
the  aircraft.  The  numbers  1 and  3000  are  used  here  strictly  for  the 
purposes  of  the  example. 


Quasi  linear i zatlon  Method.  The  quasilinearization  as  proposed 


should  be  tested  on  both  single  and  multivariate  functions  to  determine 


its  merit  relative  to  other  optimization  ami  suboptimization  techniques. 
Various  acceleration  techniques  should  hi-  investigated  to  determine 
their  effect  on  the  convergence  of  the  proposed  method.  The  concept  of 
moving  to  the  surface  should  be  examined  more  closely  to  determine  the 
number  of  points  to  examine  in  each  direction.  As  the  probability  of 
being  within  an  arbitrarily  small  distance  from  the  "surface"  increases, 
better  jumping  techniques  should  be  able  to  be  utilized.  Searching 
techniques  should  be  incorporated  which  are  not  limited  to  directions 
parallel  to  an  axis.  This  would  allow  ridge-following  algorithms  to  be 
developed,  thereby  increasing  the  probability  that  the  extreme  value 
returned  by  any  single  test  is  in  fact  a true  local  maximum.  As  the 
probability  of  obtaining  true  local  maxima  goes  up,  the  number  of  "new" 
samples  entered  in  the  sample  population  should  decrease  correspondingly, 
thereby  allowing  for  fewer  tests  to  be  conducted. 

Better  methods  for  analyzing  the  sample  population  should  be  in- 
vestigated. As  searching  techniques  become  more  complex,  the  population 
should  be  analyzed  more  frequently,  hence  the  need  for  a more  efficient 
algorithm  to  determine  the  stopping  criteria.  Methods  proposed  should 
be  able  to  bo  applied  to  the  ball  and  urn  problem  discussed  in  Chapter  b. 
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Appendix  A 

User's  M nnu al  f or  t he  FJ r s t JRe vision 
of  tito  N-bit  Simulator 


By  using  the  first  revision  of  the  n-bit  simulation  tool,  a user 
can  evaluate  the  numerical  effects  that  various  computer  architectures 
can  have  upon  a FORTRAN- programmed  algorithm.  The  simulator  was  designed 
to  be  executed  on  either  CDC  6600  or  CYBER  74  computer  systems  and  re- 
quires that  programs  being  executed  by  the  simulator  be  expressed  in  the 
FORTRAN  IV  (extended)  programming  language  with  a few  restrictions. 

This  user's  manual  will  describe  tbe  ten  user  options  available  for 
describing  different  computer  architectures  and  the  control  cards  for 
executing  the  first  revision  of  the  n-bit  simulator.  The  programming 
restrictions  and  considerations  applicable  to  the  original  simulator 
still  apply  (Ref  37). 

User  Options 

To  provide  the  user  with  greater  flexibility  in  simulating  the 
effects  of  various  computer  architectures,  six  of  the  seven  options  of 
the  original  n-bit  simulator  and  four  additional  options,  for  a total 
of  ten  user  options,  were  incorporated  into  the  first  revision  of  the 
simulator.  These  changes  require  only  minor  modifications  on  the  part 
of  the  user. 

First,  two  arrays  must  now  be  dimensioned  in  each  routine.  The 
array  named  KEY  must  be  dimensioned  to  eight  (e.g.  DIMENSION  KEY (8) ) and 
the  array  named  TKEY  must  be  dimensioned  to  four  (e.g.  DIMENSION  TKEY 
(4)).  The  array  TKEY  holds  the  floating  point  values  to  be  used  in  the 
overflow  and  underflow  checks,  and  the  array  KEY  holds  the  fixed  point 
overflow  values  and  the  user  options. 


Second,  tho  Sri'NIHT  rout  i no  which  is  called  as  the  first  executable 


statement  and  wlie rover  else  dosirod  now  lias  twelve  parameters,  two  of 
which  are  usot  opt  ions,  and  the  two  arrays  KEY  and  TKl’Y.  The  required 
form  of  the  SKTKBIT  subroutine  call  is  shown  below,  with  each  f repre- 
senting a user  option. 

CALL  Si: TN 1' IT  (# l,f?  ?,*3,#4,  ,#6,  ?7,  < 8,  #9,  f 10, KEY  ,TKFY) 

Opt_ion  ! . Tin-  first  option  allows  the  user  to  specify  the  number 
of  hits  per  wordlength  (from  4 to  C>0)  used  ior  floating-point  ar  I timet  i r. 

tip t i on  !■?.  The  second  option  allows  the  user  to  specify  the  number 
of  bits  in  the  mantissa,  not  counting  the  sign  bit,  for  floating-point 
words  (from  1 to  48). 


Op  1 1 on  i 'Ll,  This  option  allows  the  user  to  specify  the  number  of 
guard  digits  to  be  used  during  intermediate  floating-point  calculations. 
This  option  teplaced  the  single,  double,  or  triple  precision  option  of 
the  original  simulator  and  has  no  effect  on  fixed-point  calculations. 

All  floating-point  variable  assignments  are  still  made  using  the  word- 
length  specified  in  the  first  option. 

Op  t i on  ■•4.  Option  four  allows  tho  user  to  specify  the  number  of 
bits  to  be  used  in  the  floating-point  exponent,  including  the  exponent 
sign  bit  (from  2 to  11).  The  sum  the  values  for  option  four  and  option 
two  must  he  one  less  than  that  for  option  one. 

Opt i on  "‘S.  The  fifth  option  allows  the  user  to  specify  the  position 
of  the  implied  binary  point  with  respect  to  the  mantissa.  The  value  0 
positions  the  point  to  the  left  of  the  mantissa  (making  the  mantissa  a 
fraction)  and  the  value  1 positions  the  point  to  the  right  of  the 
mantissa  (making  the  mantissa  an  integer). 

Option  «••(>.  Option  six  allows  the  user  to  specify  whether  rounding 
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or  truncation  will  be  performed  after  each  operation.  The  value  0 indi- 
cates truncation  and  the  value  1 indicates  rounding. 

Option  I! 7 . This  option  was  added  to  allow  the  user  to  specify 
whether  the  computer  being  simulated  uses  sign  plus  one's  complement  or 
sign  plus  two's  complement  arithmetic.  Sign-magnitude  machines  should 
be  simulated  by  specifying  sign  plus  one's  complement.  The  value  1 is 
used  to  specify  sign  plus  one's  complement  and  the  value  2 is  used  to 
specify  sign  plus  two's  complement  arithmetic. 

Option  It 8.  Option  eight  was  added  to  allow  the  user  to  specify 
whether  the  machine  being  simulated  is  a binary,  quaternary,  octal,  or 
hexadecimal  machine.  The  value  1 is  used  to  select  binary,  the  value  2 
for  quaternary,  the  value  3 for  octal,  and  the  value  4 for  hexadecimal. 

Option  If 9.  The  ninth  option  allows  the  user  to  suppress  the  print- 
ing of  overflow  and  underflow  messages.  These  messages  specify  whether 
overflow  or  underflow  occurred,  whether  positive  or  negative,  and  the 
routine  and  line  number  where  it  occurred.  If  overflow  occurs  on  the 
actual  CDC  word,  the  program  will  terminate. 

Op t i on  If  10.  This  option  was  added  to  allow  the  user  to  specify 
the  wordlength  used  for  fixed-point  calculations.  This  will  allow  for 
valid  overflow  checking  for  both  fixed-point  and  floating-point  without 
having  to  call  SETNBIT  whenever  switching  from  one  type  to  the  other. 

The  maximum  value  allowed  is  48. 

An  example  of  a typical  SETNBIT  subroutine  call  is 

CALL  SETNBIT  (32,23,0,8,0,0,2,1,1, 16, KEY, THEY) 

The  argument  values  shown  specify  a 32-bit  floating-point  word,  broken 
down  into  1 sign  bit,  8 exponent  bits,  and  23  mantissa  bits,  with  no 
guard  bits.  The  binary  point  is  on  the  left  of  the  mantissa,  and  the 


binary  machine  does  two's  complement  nri thmet i c with  truncation.  Over- 
flow and  underflow  messages  are  to  be  printed,  and  fixed-point  calcula- 
tions are  performed  using  a lb-bit  wordlength.  If  double-precision 
floating-point  variables  are  used,  the  first  and  second  options  should 
be  changed  to  reflect  the  correct  wordlength,  thus  allowing  for  double- 
precision  calculation  and  storage.  This  must  he  distinguished  from  dou- 
ble-precision calculation  with  single- precision  .storage,  which  can  hi' 
accomplished  by  specifying  the  correct  number  of  guard  digits. 

Control  Cards 

A very  basic  understanding  ot  how  the  simulator  works  helps  in  con- 
structing job  control  cards  to  simulate  a program.  The  preprocessor, 
which  is  executed  first,  reads  as  data  input  the  FORTRAN  program  to  he 
simulated,  modifies  the  assignment  statements,  and  writes  out  a file 
containing  the  source  of  the  modified  program.  The  program  on  this  file 
must  then  he  compiled  and  executed.  As  it  executes,  it  will  call  the 
subroutines  which  are  loaded  with  the  objects  of  the  compiled  program. 

There  are  many  different  ways  to  build  a deck  to  execute  a program 
using  the  n-hit  simulator.  One  very  simple  deck  setup  using  source 
cards  for  the  preprocessor,  the  subroutines,  and  the  FORTRAN  program  to 
he  simulated,  is  shown  below. 

FTN,  I .= DUMMY  1 . 

1.00, , , NUTT. 

REW1ND.NB1T. 

FTN,  I.* DUMMY 2 , B*SUBS . 

REWIND, SimS. 

FTN , 1 -Nil  TT , 11= PK00 . 

REWIND, PR00. 

LOAD,  l’ROO,  SUDS. 

EXECUTE. 


m 


7/8/9  Card 

***Preprocessor  source  cards*** 

7/8/9  Card 

***FORTRAN  program  source  cards*** 

7/8/9  Card 

***Subroutines  source  cards*** 

7/8/9  Card 
6/7/ 8/9  Card 

Another  simple  way  to  execute  a program  using  the  simulator  involves 

using  object  decks  for  the  preprocessor  and  the  subroutines.  A deck 

setup  to  execute  this  way  is  shown  below. 

INPUT,,, NBIT. 

REWIND, NB IT. 

FTN,I=NBIT. 

LOAD, LGO, INPUT. 

EXECUTE. 

7/8/9  Card 

***Preprocessor  objects*** 

7/8/9  Card 

***FORTRAN  program  source  cards*** 

7/8/9  Card 

***Subroutine  objects*** 

7/8/9  Card 
6/7/8/9  Card 

The  above  shown  job  setups  involve  using  large  card  decks  which  are 
rather  cumbersome.  If  several  executions  are  planned,  the  objects  for 
the  preprocessor  and  subroutines  should  be  put  on  permanent  files  where 
they  could  be  accessed  using  either  batch  or  remote  job  entry.  The  deck 
setup  shown  below  will  store  the  preprocessor  and  subroutine  objects  on 
permanent  files. 

REQUEST, OB JECT1,*PF. 

REQUEST, 0BJECT2 , *PF. 


COl’YHK,  INPUT, OBJECT], 

copybe,  input,  object:?. 

REWIND, OBJECT1. 

REWIND,  OBJECT2. 

cataloc.ohjkcti  .nbitsim.cy  i.rp--vI?o. 

CATALOG,  OBJ KCl'2 , Nil  I TS  1 M,  CY  2 , Rl’--  ] 20. 

7/8/9  Card 

***P i oprocessor  oh  \eel  s*** 

7/8/9  Card 

***Suhrout  tiif  objects*** 

7/8/9  Card 
b/7/8/9  Card 

Once  the  preprocessor  and  subroutine  objects  are  on  permanent  tile 

they  can  he  accessed  by  using  the  following;  deck  setup. 

ATTACH , OBJECT  1 , NH I TS I M, CY- 1 . 

ATTACH , 0UJECT2 ,NB 1 TS I M, CY»2. 

REWIND, 0BJECT1. 

RKW I ND, 0BJECT2 . 

OBJKCT1, , ,NBLT. 

REWIND, NU IT. 

FIN, I-NB1T. 

LOAD, LGO, 0BJE(T2 . 

EXECUTE. 

7/8/9  Card 

***EORTRAN  program  source  cards*** 

7/8/9  Card 
6/7/ 8/9  Card 

The  user  options  added  to  the  first  revision  make  it  even  more 
costly  to  simulate  a program.  Core  requirements  changed  only  to  the 
extent  that  the  subroutines  are  slightly  larger  and  each  call  to  the 
subroutines  contains  one  additional  parameter.  It  is  conjectured  that 
execution  time  will  increase  approximately  twenty  percent,  since  the 


subrout inos  are  longer  and  more  romp  lex. 


Appendix  1. 


N;J'j.!.S.irauji'!  pr  Sub rout  lues  Sourer  fisting. 

The  concept  of  using  2'.’  n-bJt  suhroul  Ines  to  handle  cacti  combination 
of  data  types  war  retained  from  tin-  original  simulator,  although  t ho 
code  in  tlio  subroutines  was  extensively  modified.  The  SKTNB1T  subrout  (in' 
was  also  extensively  modified  to  give  the  user  more  flexibility  with  the 
simulator.  Three  other  subroutines  called  HOllNDKK,  ONKTKNf,  and  TWOTRNt' 
were  added  to  handle  all  rounding,  sign  plus  one's  complement  truncation, 
and  sign  plus  two's  complement  truncation.  In  the  program  listing,  below, 
the  subroutines  are  arranged  into  four  major  categories.  The  SKTNHIT 
subroutine  comes  first,  followed  by  tin*  special  rounding  and  truncation 
routines.  The  third  group  is  composed  oi  those'  subroutines  which  handle 
only  integer  variables,  and  the  fourth  group  contains  those  subroutines 
which  handle  at  least  one  real  variable. 


lit. 


<">  o o oorsooooonrjoooononnnoooorsorsooonn 


stnt  c pirn  s:  t nt> i u >;-s,manisa .iguapo, ipx'nt, i pi res,  irmptr, 

YU  SSt.S  ,11  1X0, N1  Y , 1 X . . Y ) 

CUM  SION  XtY(fi) 

PIIICF  SION  TMYK ) 

P#TA  l>,XP0S/377V’7?/’7-'77777777  *n/ 

PM  A 1 KXK  f G / T 1 31 .i  C 0(1  7 o '■  P 0 0 C 0 P 0 (I  n " / 

pat  a n-Mnos/ro-ji-  oinr  f ocr  ocuo.’(v 

PA  1 A TMNNfG/?/7o.  77W  > 777/7777(1/ 

roiis  section 

THIS  SECTION  1 Ills  TM‘  INCH  I POP  IS  CAL  VALUtS 


Nuns  Nip) nr r oc  rtTi;  j;(  rtOM IMG  point  kopo 

l‘  Mil  S A KANT  3 Sr  A L ICTII  'VO  SHI.'  01  GUMD  PTi'TTS 

IGUARO  (HINT!  O'  G'l'KP  " I G I T n PS  C POP  TNTE'HfDlAlE  VALUES 

1 ( XI  NT  rXPTN  I T L'M  TH  11/  SIGN  PIT 

in r os  o • u rv  i r right 

1H.TTK  r<  = Tt  im\Tr  1 - ROUND 

ICMTWO  1 = ("I  S COM'LfM'NT  Z - TVPS  COMPLEMENT 

nvrr  1 - pin.v  y ?.  - ouateknau  3 = oc.t/l  <>  = hexadecimal 

MfSSGS  P S SUE  DRESS’  ON  1 - PRINT 

IfIXD  NUMNEI  or  UTS  IN  EIXEP  POlMl  HERO 

<iy  Ats  ay  to  r t i i ro:  oupr  suproimini  s 
k i y ( i ) max  '-osiim  uxrP  rein  VALUE 

KfY(?>  MAX  .NEGATIVE  I 1 X' 0 POINT  VALUE 

CfYI.M  1 1 NOT .1 
UYO.)  I ONI  WO 
K t Y ( (* ) MPSSGS 

K'A(<>  A 1 N T L MANTISSA  HITS  SAVCD  (MAXIS A 1 

Kf Y ( 7 ) 1 NT ' i ‘If  PI  ATI  EXT!  A PITS  S 4 V I J (IGUARO). ■ 

K(Y(P)  I TYPE 


TKPY  ARRAY  TO  ft  LI 

TKEYCl)  MAX  °('UTIVr 

TX(Y(?)  KAX  Ml  ".ATI "f 

IK' A (3)  MIN  °!,  " I T I V ‘ 
TKt  Y (<. ) PIN  ME GAT  I V E 


FOr  0 T MrlT  SUPP.OUT  I Nt  S 
FLOATING  POINT  VALUE 
FLOATING  POINT  VAllir 
rLC  A T TNG  rni'T  VALUE 
FLOATING  POINT  VALUE 


IF 

<MrITS  .GT.GU 

0.0  7(1 

.’IP 

IF 

( (MA'IT  SA.G1  . ,f 

) .00. 

{ M ANT S A . L'  , 1 ) ) 

GO 

TD 

215 

IF 

< (H  N’MT.NE.G) 

.ATT. 

nni.-eTN.ir. 5)  ) 

GO 

10 

??p 

II 

( (NPII  s.ro.eu 

.An . 

fXF.NQi  " . T V . S Y ) 

r.o 

TO 

IF 

< C I C'E!  WO.N  11 

• A'n. 

(IONTWO.M  . m 

GO 

TCI 

030 

ir 

( (tCIHKO.l  T . 1) 

.10,1 

* G p A r n.GT. 

PAN 

1 s 

n ) 

11 

( (TTYPE.Nl  . 1 ) . 

ad,  < 

• 1 Y'T  . VF  . ’)  , A,  ‘|P  , II  T Y 

• r .n 

(71  YPE  .N:  .!>)  ) 

(.0  TO 

PGP 

ir 

(CPF  NSGS.NE.il 

.AN  1 » 

(MfSEor.  .ir.m 

GO 

TO 

?<  *•> 

IF 

( (If  XPNT.GT . U ) . OR. 

O 1 XFN  T . L . 1 ) ) 

GO 

10 

?■  p 

IF 

( ( 1 'XTN1  * I'M 

T G A ♦ 

i ) .No.irus) 

r.o  ; 

r o 

?\  r 

IF 

( ( I! TPOG.N".  1) 

. and. 

(IPTFOS.N- .1)  ) 

GO 

T3 

?00 

IF 

(1EIXP.GT  ,h(.) 

r.O  TO 

?cr> 

t NO  fPITS  SECTION 


XCY(1>  = SHIFT  CM/. SK(1)  ,IFIXO) -1 
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key  ( ;-  > = -key  it) 

IF  ( (ICHTK  UtT.2)  .AND.  UFIXD.NC.tP)  > KEY<?)  = Kf.  Y ( 2 ) - 1 
KE  Y ( ? ) a IRNIUR 
Kf  Y (l  ) = IONTWO 

i-'Ydi  = m scgs 

KEY  ((  ) *-  HANT  SA 
KEY  <7  >=IGUM?P 
KEY(f)  = I TYPE 
C 

C Fill  ri GATING  °OIMT  OVrRFI OW  A NO  UNPPRFIOW  CHICKS 

c it  iint.Ii  now  r:m'its,  7i.ro  win.  or  os-it 

C If  OVERFLOW  RESULTS,  MAX  COC  VALUiS  Hill  lit  USED 

c 

C THESE  CHICKS  ATI  GOOD  ONLY  rCR  COMPUTERS  WITH  UNBIASED  EXPONENTS 

C 

C STAFT  WITH  MAX  A tIH  Mill  COC  VALUIS 

TKEY ( 1 ) =T HXPOS 
TKCY  < 7) sTHXNIO 
TKEY (?) =T HNPOS 
TKf  Y(A ) =T HYNES 

C CI.rMf  MASK  Of  *1 A NT  IS0  A PITS  TO  SAVE 

TF M'l  = SMi  F T (M  ASK  ( HANTS-* ) 

c arm  pax  Exporrur 

iTri!F=-/,3«iTY:,r',<?»  * < x : xpmt-d-d 

C AC JUST  FCT  LOCATION  0'  P1NARY  POINT 

TF  (irTPOS.FO.l)  ITLMP-1T  EHfMMAIITSA 
JT FHf  -ICCii 

C TRANSFER  IMPLIES  CTO  VALUES  ARF  USED 

TF  IT  T f MP.Gf . JTEFPJ  F.O  TO  ' 0 

c ccnr  to  handle  t.oc  exponent  hamolthg 

C l i'Ft  F IS  DTHAKY  IX^ONffiT  AS  INTI  GIR 

IF  (ITCMF.IT.3)  JTE  <P=  JTE.MP-1 
IT  ERF *IT£ MP»JTEMP 

C SHUT  EXPONENT  INTO  POSITION 

TEff  7 = SHI  n (IT  F'ir  tt)  ■ \ ) 

C Ct.t  / T t RAX  PDStT  TVS  FLOATING  POINT  VALUE 

TK  ( Y ( 1 ) s'*  l HP? • OR.  TEMPI 

C CREME  MAX  NEGATIVE  FIOATIHG  POINT  VALUE 

TKEY ( ?) s-TKCY ( 1 ) 

C SKIP  SPECIAL  CASE  FOP  T WOS  COMMITMENT 

IF  UCKTWO.ro, 1)  GO  TO  HP 
C CTFATE  UNNOR'IALT  ?n  ADJUSTMENT 

Ul  Po=SHIFT(MASK(1)  ,«.o-HANTSA» 

T F. MM.  sTEI'P? . OR.  rt  MP3 
C ADJUST  1 K E Y ( ? ) 

TKJ  Y (?) =T KEY ( ?) -1  t MPA 
C Cf  LATE  rANUSEA  FOR  UNOFRFLOW 

EG  TEt P1=SHIFT(MASK<1) ,UB) 

C Cf.rATf  MIN  IXPO'fNT  (>-AX  NEGATIVE) 

ITI  NFs-AWI  Y " ' ( ? * * ( I "XPNT  - 1 ) > 

C ADJUST  PC?  TWO!  COM°L'MENT 

IF  (ICKTKO.EQ.l)  TT:«PrITE‘.P«ITYPr 
C AfiJI'ST  FC>  LOCATION  O'  ‘UNARY  ‘'PINT 

TF  (1FTP0S.E0. 1)  ITEMosITUiPfHAMTSA  . . 

JT  E IT  = 10  23 

C R I T URN  IMPLIES  CPC  VJIUIS  ARE  USEO 

IF  (JTENP.LT.-JTEM0)  F'rl  URN 
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no  o o ooon  on 


coni  to  H'.K'nti  me  fx"on(ni  hvioi  hu; 

1HM  IS  ° I N \ P Y IX^ON’N)  AS  INTtilR 
iniiM"  (‘tsju'ic 

jMin  fxfonlnt  into  position 

T(  MM  fin  rj  n I r<r 

rum  min  post:  tv  floating  point  vaiut 

1*1  V ( ? ) - T I MP?,C  J.T  HIM 

ri.t/H  pin  v:uAiivr  iloating  point  vaum 
TM'  <<  ) -T  O Y(M 

SNIP  SPSCTAL  CAM  rP(  TKi'S  COMPLIMENT 
jr  (iCNTWo.ro. i > nttupi 
CM'.*  If  lINNOn.U)  ' 0 /TJt'ST.Mt  NT 
T t T l J*  S U I P I ( V T S < t \ ) > • '*  » M A MTS  A ) 

TFTlPI  »’  1 ENP? .0  ( . 1 t Mp  1 
Al  Ol'ST  T K1  Y <L) 

T W t Y ( m ) -T  SLY  ('»>  -1  I'MPI, 

M TURN 


?ip  r»jM* 

STOP 

?i*  print* 

STOP 

??o  ri.’irn» 
S ' op 

??s  mm* 

ST  OP 

? sc  prim* 
FRItn* 

STOP 

?sr  point  * 

STOP 

2‘  n f f TNI  • 
PP1NT  * 
STOP 

?< ',  flint* 

STOP 

?T  0 FRIHT  * 
STOP 

?■  P F K I NT' 
POINT * 
fsjnt* 
PSlNT* 
STOP 

If*  r*') nt* 

STOP 
r i- 1 nt  * 
STOP 
(NO 


"N  HI  T S TOO  ‘T  TO",  NOTTS 

""AO  MANTISSA  LTHOTU  ",  MAPI'S  A 

"OAO  ROUNO/T  RtlNf  AT OPTION  ”,1KKPTR 

"CANT  ROUND  H/  (0  PITS  ",  NsITS 

"HA  CHI  N!  ONLY  POTS  PN[S  Os  HITS  COMP  SU'll '*• 
"fON  SIGN-  MAG'fl ' UIH  , USD  ONI  S COMP  ",IONTLO 

"0 AO  GUARD  OTGTTS  ",  I GUARD 

"PAD  MACHINE  TYrr  SMCIFlrn  " , I TYPP 
" 1 - ° 5 N M . Y P = OUAT  [SNAKY  3-*  OCTAL  T Ur\AfUOIMAL 

"0  AO  MOSS  A Of:  OPTION  NtOU'ST.O  ",MSSGS 

"GAO  TYCOM  NT  ",  1 ( XPNT 

••EXPONENT  ANT  MANTSA  APT  NOT  C OMpA  IT  PL!  " 

“"ANTST  ♦ ITXPNT  * 1 = N01 1$  " 

"UXPNf  = " 1 1 f YPNT 

"HANTS A = ", MANTSA 

"SAD  POINT  POSITION  P.i  0ULST''0  ",  T PT  P OS 
"PAO  FIXEP  POT N’’  WOROl.f  NOTH  ",iriYO 
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SUDKOUTJ  Kl  ONCTRNC(RRR|ir  Nl  , K I V) 

SUBROUTINE  TO  ADJUST  All  MANTISSAS  10  PROP!  R I l MOTH 

USIO  IJMFN  S 1 MO  l A T I KG  MACHINES  UNI  CM  UP:  SIGN  f'l  US  ONES  COMPl  l MINT 
Rl  PRESENT  AT  ION  OF  HOMING  POINT  KUM’URS  Kl  111  TRUNCATION 

DIMENSION  KEY(8> 

C mask  to  dick  orr  tor  ixroNruT  amo  sign 

OATA  IKNCl/T/TTOOCOlOOrOOCCOOCO"/ 

C CI.AK.l  FXPONPN1  10  INTEGER  DISCARD  NANT1SSA 

lAI)JST‘-SUirT  (KRR,  -A  M 

C ADJUST  IMII.K  r0R  SIGN  OF  MANTISSA 

IF  (RPR. IT. 0.)  IA'KISl  .NOl.lAUJSr 
C ADJUST  IIR.:GF.\  FOR  FOC  < XPCKTNT  Rf  PRF  SI  Ml  AT  1 ON  SCUl.lt!: 

IF  (1  AI1JS  l.Ol.l  3? A)  JAUJST  J0‘  o-I  f.DJSl 
IF  ( J AOJSl  .IT  . 1 JACJST' rot’K-TAOJST 

C ADJUST  Mi  NT  FOS  MACH  I N ! RADIK 

JADJST  MODE JATJST , s- V ( )) 

C comm  NUN  DC  R O'-  PI  IS  TO  truncate 

KADJST  - JAPJST  ♦ 4 S -Rl  Y ( (■> ) 

C ADJUST  FOR  KEEPING  GUARD  PITS  ON  T NT  * RM  l 01  AT  F CXPRISSIONS 

TF  (JFKl.iQ.O)  KADJST- KADJST -KPY I 7 ) 

C TRUNCATE  I'Y  SHIFTING  riKST  RIGHT i 1MCN  LFFT 

RRR-SKlf  T (S'liri  ( IRR,  -KADJST)  , KADJST) 

NET  URN 
END 


o o n o o o o oo  no  non  o n n n n ooooon 


SUPR0UT1NE  T WOTRNC ( RRR, I FNL  ,KIYT 


SUBROUTINE  TO  AOUIST  All  MANTISSAS  TO  AROPER  LENGTH 

USED  WHEN  SIMULATING  MATH  IN!  ' WHICH  USE  SION  PIUS  TWOS  COMPLEMENT 
REPRESENTATION  OF  FLOATING  POINT  NU.M'U  RS  WITH  TRUNCATION 

PI Ht NS  I ON  KEY (6) 

mask  to  pick  orr  enc  exponent  ant  sigh 

DATA  TNNCl/TA?70r00a00i  P 0 DC  0 0 0 0 'V 
MASK  TO  PICK  OFF  CPC  MANTISSA 
PA T A TRNC2/0&J  177  7 7 ”>77  77777777')/ 

CMANC.l  EXPONENT  TO  INTEGER  DISCARD  MANTISSA 
I APJST • SMIl m R, -US) 

ADJUST  INI E OP \ r OR  SIGN  OP  MANTISSA 
IF  (FKK.LT.O.)  I A OJST= , NO  T. 1 A 0 J > T 

ADJUST  INGEGv1  FOR  CPC.  IXPONiNT  representation  scheme 
ir  U APJST.  Gt  .)  A pi.)  JAM  JST  = ?C  ' E-  I APJST 
ir  UAPJST.LT. 137I.T  japjst  = 7ov'. -iadjst 
ADJUSTMENT  FOR  MACHINE  RADIX 
JAfIJST  MUP(JA:)JS»  ,K  ! Y (A  ) > 

COMPUTE  NUT)- ‘ OF  PITS  10  TRUNCATE 
KAPJST - JAPJSTeu <-KFy (0) 

ADJUST  FOR  K'  f PING  GUARD  PITS’  ON  INTERMEDIATE  EXPRESSIONS 
IE  (lrNL.fO.C)  KAPJSTi) ADJST-Ki Y(7) 

POSITIVE  NUNS'  R IS  READY  TO  ’’F  TRUNCATED 
NIGAT1VE  NUTTER  MUST  PE  TRUNRATEP  TOWARD  MORE  NEGATIVE 
IF  (PRR.GE.r.)  GO  TO  1? 

Nl XT  S LINES  ADJUST  RRR  SO  IT  PAN  PE  TRUNCATED  USING  SHIFTS 
PICK  OTF  SIGN  AND  EXPONENT  KITE*  MASK 
TRNCJ-TRNC1 .ANO.RRR 

CRfATf  MASK  Of  MUMPER  OF  PITS  TO  SAVE  (POTH  EXPONENT  AND  MANTISSA) 
TRNCR--T  ASKfOO-KAO  1ST) 

MASK  KITH  MANTISSA  fOSITIONS  TO  GET  A STRING  OF 
ONI S IN  MANTISSA  POSITIONS  10  PC  SAVED 
71  KPS  IN  POSITIONS  10  BE  DISCARDED 
TRNCA  -1  RNCU . A‘|P . TRNC2 

OP  IN  SIGN  ANT  EXf’CHfNT 
TRNCL-TRNPU.OR.TMJCT 

SIGN  IS  NEGATIVE,  SC  T RUN  CM  TS  AN  LINN  OR  MALI  7ED  NEGATIVE  HUH  (TER 
ADJUST  RRR  DOWNWARD  (MORE  NEGATIVE) 

RRR-TKNCU  *RRR 

C TRUNCATE  ETY  SHIFTING  FIRST  FIGHT,  THEN  LffT 

10  RKKrSHIET (SHIFT ( RRR, -KAOJST  ) ,KAOJST> 

RET UP N 
END 


a o o o n n o o o o o o o r>  o o o o o o o n oaoooooo 


SUPROUTINi  ROUND!  RtRRR,  1 f Nl  , KEY) 


SUPROUTINt  10  ADJUST  ALl  MANTISSAS  TO  ’KOI  M'  U Ni.lH 

USED  WHEN  SIMULATING  MAE  HI  Nl"  S UUICM  II?-  SI  PR  Ill'S  ONES  COIO’L  CHI  NT 

ok  si  mi  itus  runs  rotiric.MENT  Rt rst sfhiations  or  floating  point 
NUMPCRS  WITH  POUNDING 

ROUNDING  IS  oour  n>  A OPING  ANtl  thin  If.lNCAtlNG 
Pint  NS I ON  KTYtFl 

mask  to  dick  orr  roc  rxroNruT  and  sics 
PA  1 A Ii:NCl/7AAAJDO£ITCC"OCCircaO?:T/ 

MASK  TO  PI OK  oi  F CPC  MANTISSA 
DATA  T R N 0 7 / r 0 3 0 7 . ' 7 ’ ' ~ 7 ' 7 7 7 7 / 7 7 ’ I / 

CHANG!  IXMVTT  TO  I N 1 : o S. K OISCAKO  MANTISSA 
I AP  JS 1 * SMI F V ( R ' R • -AH 

APJUS1  lNTIolR  I OR  S I OS  Of  MANTISSA 
IF  <1  KR.i.T.0.1  UPJST=.N0T.1A0JST 

ADJUST  IMG;  I'l  l TOR  Ct)C  i M'CN  MT  Rf  PRi.  STMT  ATI  ON  SCHEME 
IF  tIAPJST  .At  . I 371.)  J A DJS 1 r JT  ;•  t>-  J A 1 J S T 
IF  Cl ADJST.Ll  . J JP-.)  JA^JST  JCH-lfOJST 
AOJUSTM!  KT  F,”  MAO  MI  N!  RAOJX 
JAP. 1ST  MOO  ( JA0.I5T  ,Kr»M  I 

COMPUTE  IUH'Vi;  Oc  PITS  TO  TRUNCAT  ( 

KADJST - JAOJST  ti-  a-k<  y tf>) 

AP.'UST  FOR  N'UING  GUARD  HIS  ON  I NTERMEPI  AT  E EXPRESSIONS 
IF  11IML.  10.01  v.AOJST -Kfcn.l  T-NTY17) 

CPEA1E  MASS  OF  N'JM'T • R O'  0 11  S 10  USE  IN  RCUNDING 
OKI  MOKr  THAN  MONME P TO  SAVE 
STEM!  »PASK<Gl-KA->  JST) 

CM  Alt  MANTISSA  OF  l NNO\MALl*rP  NUMOrr  TO  A 00 
TTFHP=SUiri ("ASK! 1 1 , RADJST) 

STOARAIE  COD r IOR  MEG  AT  1 V(  MUMPERS 
ir  (KKR.IT.O.)  r.C  TO 

tick  orr  coc  fits  or  norp  to  use  i or  pounoing 
RRRrKNR.ANLl.STT  Mr 
60  TO  Dtl 

C.MANGt  MANTISSA  Or  UNNORMAl 1 M P NUM51ER  TO  NlGATlVE 
10  HEMP*.  NOT.  TTLfP 
ST  t MT «. NOT. SIT  MR 

pick  orr  coc  aits  or  woro  to  usr  ros  rounding  kith  trailing  ones 

RRk-  KKK.OR.STEMP 

7TR0  OUT  ( XPONT’MT  FIFLO 
TTf MP=TT(  MP.ANT1.1  KNC? 

COPT  'OR  All  SUM  Tf RS 
J SOI  AT l MANTISSA  ri'LO 
?0  TRNCO- KR*  .ANO. 1 RNO  1 

OR  IN  EXrONINT  FirLr 
TRUCK*!  RNC 1 .OS  , T 1 ■ *«’ 

APO  UNNC.fmt.T7Ft)  NUt'D f K 
RRR - T K NC *.  * RKR 

ROUS'D  I'SJ1'  ' S>‘7CTS  TO  IPUf.'C ' T 

TKUNCA1E  IT  SHI FT I NO  FIRST  RIGHT,  THEN  l T FT 
RRRrSMIFT (SMITT (RRR,-KAOJST)  , KADJSI) 

RETURN 

END 
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INTEGER  FUNCTION  IIADOIII, 12, ROUTINE, LKCtU, IFNI  , KEY  ,TKEY) 
OIlFfSION  T KEY  ( i* ) 

PlUfl'S)  ON  KEY  ( 6 ) 

1IAPP= 1 1*1? 

IF  (If  r.VAP(JIAmi  .NE.P)  GOTO  no 

if  (ilAon.r.T.KrYdn  r.o  to  lo  . — - — . . 

IF  (IIAOO.UT. KfY(2)>  GO  TO  60 

RETUF  N - - — 

1.0  1 1 AOC  r KEY  ( 1 ) 

IF  (KEY  (r  ) . FO.  0)  RETURN  — - — 

PRINT  (1,2-<C)  POUT!  Nl  , L NCNT 

RETURN  

GO  lIAOr=KEY (?) 

IF  (KEY <3 ) . EO. 0)  RETURN 
PRINT  (1,200)  ROUTINE , L NCNT 

RET  UF  N ...  

100  PRINT* ."SORRY,  TURKEY  I CANT  CONTINUE” 

PRINT  <1,  1EP)  IIAOP  ... 

1E0  FORMAT  (1IIC,P?0) 

2L0  EOF  MAT  (lH;,"mi!)  ’OSITIVF  OVERFLOW  LINE  = ”,15: 

26  0 FORMAT  (1HC,"IA33N  NEGATIVE  OVERFLOW  ”,65,”  LINE  = ”,1G 
STOP 
END 

INTEGER  FUNCTION  I HNS (II ,1 2 , ROUTINE , L NCNT , T FNL , KEY, TKEY) 
DIMENSION  TKEY  ( ;>) 

DIMENSION  KEY ( S) 
llfiNE  = 11-1? 

IF  (l E6VAR { IIMNS) .HE • 0)  GOTO  100 

IT  (IIMNS.GT.KEY(I)  ) GO  TO  LO  . 

IF  (1IMHS.LT. <EY(?) ) GO  TO  GO 

PETUF  K - 

AO  I INNS -KEY  ( 1) 

IF  (KEY (6) . EO. 0)  RETURN  

PRINT  (l,?/.0)  ROUTINE, l NCNT 

RETURN  - - - - 

GO  1 1 NNS  = KEY ( ?) 

IF  (KEY (5 ) . EO. C)  RETURN  ... 

PRINT  ( 1 , 260)  ROUTINE , L NCNT 
RETURN 

100  PF JNT* , "SCrRY, TU' KEY  I CANT  CONTINUE" 

PRINT (1, 1PC)  1 1 NNS  . . ...... 

1EP  FORMAT  (1H!),02?) 

2*0  FORMAT  (1H0,”IIM>S  POSITIVE  OVERFLOW  ”,A5,”  LINE  = ",I5 
2G0  FORMAT  (1H0,"IIEYP  NEGATIVE  OVERFLOW  ",A5,"  LINE  = ",IC 
STOP 
END 

IN1EGEK  FUNCTION  IIMPY ( II >12 , ROUTINE , L NCNT , I FHL , KEY, TKEY) 
DIMENSION  TKEY ( ♦) 

01  MET  SION  KEY ( 6) 

1 1 MP  Y = 1 1*  I? 

IF  (I f G7m  R ( IT  NDY) . HE . 0 ) GOTO  100 

IF  (III  PY.GT.KEY(l)  ) GO  TO  40  • 

IF  (1IMPY.LT. <EY(2)>  GG  TO  GO 

RET  UF  N ‘ 

AC  11MPY=KEY(1) 

IF  (KEY (I ) . EO. 0)  RETURN  . . 

PRINT  (It  ZAO)  ROUTINE, LHCNT 
RETUF V 

GO  I I M°Y =KEY (2) 

IF  (KEY  ( '.  ) , EO.  0)  RETURN 
PRINT  (1,200)  ROUTINE, LNCNT 
RETURN 

IPO  PRINT' , "SORRY, TU- KEY  I CANT  CONTINUE" 

PRINT  (1,150)  IIMFY 
lfP  FORMAT  (1  HO, 020 

2-*' 0 POF.MAT  (1U?,"IIMPY  POSITIVE  OVEpRLOW  ",  (I 3,”  LINE  = “,IF 
2CP  FOG  MAT  ( INC , "I  I DVR  NEGATIVE  OVE^FIOW  ”,M,"  LINE  - ",I0 


INTF  C-FR  FUNCTION  II  1VIM  I 1 , 1 2 , KO'IT  I Ur , L NT  SI  , J FKl  , NEY.TKLV ) 

C]K« i mol  tkf  r (-.) 

MPEfSION  KEY  (8) 

novr=n/i? 

IF  (LFOVARdlOVO)  .NS.O)  GOTO  100 
IF  (HDVO.GT.KFYd)  ) GO  10  U 

if  (nrvo.Li.<Ev<?n  go  io  go 
fftumi 

to  TIOVr*KF> <1) 

IF  (KIT  (F  ) .1  O,  0)  RETURN 
F?1M  (1,240)  KOUTINF tl  NCNI 
RFTUK N 

» 0 llf.Vl'f  KFV  (.•’) 

IF  <kfY(F)  ,'O.C)  FETUk'N 
PRINT  (1.2GC)  ROOT  IT,  INCUT 
RETURN 

Iff*  PRINT*  ,”SOcrY,TU-  KEY  I CANT  CONTINUE” 

pkiut  (i,  iro*  mvo 

l*i  * C ) MAT  ( 1 H A , 0 ? 0 ) 

2‘  r roi  mat  ( me, "novo  "os’Tivr  overflow  ", ut.r  = ”,iF) 

rto  FCt  MM  tl  HO,  “II  ICY  NEGATIVE  OVi  ''HOW  ” , A $ , ” IHE  = “,I!i) 
STOP 
END 

INTEOEf  FUNCTION  IT”  XT  (II, 12,  ROOT  I NC , I.  KC  N T , J FNL  , Kl Y,1 KEY) 
DIMENSION  TKE Y ( 4) 

01  ME  CM  ON  KtY(8) 

nr xi -n  *i? 

IF  (L FGV1R(IUXP> .NE, 0)  GOTO  100 
IF  (lltXP.GT.KEYd)  ) GO  TOGO 

if  mrxp.iT.KFYcan  go  to  go 

KETlIf  N 

*•  G 1 IF  XP  = Kf Y ( 1) 

IF  (X  t Y (’  I.EO.O)  RETURN 
rRIUT  <l,2sfl>  ROUTINE, LNCNT 
RETURN 

GO  IIEXF=KEY(2> 

IF  (K E Y C ) . EO.  0)  P.FTtlRk' 

PRINT  ( 1 , ?tC)  ROUT  I Nr , LNCNT 
ptTUf N 

lFO  FaiNV, "SORRY,  TURKEY  I CANT  CONTINUE" 

PRINT (1,150)  1 1 EXP 
l'f  rC'M/T  (1M 0,020) 

2*3  F Ot  MAT  t 1H0 , “I j 'XP  "OS’TIVE  OVETl  OW  LINE  = ",1'.-) 

2Gf  FORMAT  ( 1 M3  ,"I  INN'S  NEGATIVE  OVERFLOW  ”,A5,"  LINE  - ",15) 
STOP 
END 

INYfCru  FUNCTION  IASGFKIi, ROUTINE, LNCNT, IrM ,KI  Y,1 KEY) 
CIMI  SIOn  TKEY  (4) 

OTPFR  SI  ON  KEY ( P) 

I«SGN=I1 

IF  U EC VARtlASGN) .ME.")  GOTO  130 
IF  flASGN.GT.KFY(l) ) GO  TO  40 
IF  (1ASGN.LT .KEY{ ?) ) GO  TO  GO 
FETUR  N 

40  IASGF-KrY (1) 

T c <KfY(‘  ).r0.rt)  ittiion 
FPI’-'T  ( 1, 24  0 ROUTINE, LNCNT 
RFIUI N 

fO  IASGI  “KEY (2) 

IF  (YET  ('.  ) .EO.O)  (“TURN 
r»lf!I  (1,260)  ROUTINE, LNCNT 
FEIUf  N 

1PC  r»INT*, "SORRY, T'-TKEY  I CANT  CONTINUE" 

EKNT  (1,153)  IASGN 
lri  r 0 f NAT  ( 1 HO, 020 ) 

2'  f ro  MAT  ( l)n,"IASGf!  POSITIVE  OVFRFIOW  HUE  = ”,15) 

2d,  pCMl  (DO, "lire  NEC ' T I VI  OV.TELIW  LINE  = ",I9> 

rt;o 
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pfu  » unction  rraoperi , R2, routine, lnc  NT; i rn  ,»av,  tkcyj 

DIMM  ST 0 1 KE  Y ( 8 ) 

DIMENSION  TKF.TU) 

EPA0P=R1 »R2  

IF  <KCY<  O.FR.l)  CALL  1 OUNPCR  (P.^ATO,  I FUl , < t Y) 

IF  ( C K E Y ( 3 ) « FI  . J ) .AMO,  ( K t Y ( *> ) • c.r> . ? ) ) CAU  t HOT !>HC  ( R.RAOO , IFNL  , < CYI 

IF  ( < K t Y(  3)  .FT.  I)  .AT).  KE  Y f «♦  > .CO.l ) ) CALL  CNtl  Ri(C  ( RRAOO,  IT  NL  , KEY) 

IF  (rR/.OD.CT.TKt  Y (1)  ) CO  TO  ?J 

ir  (I  RAID. IT  . TKEY  (0)  ' r0  TO  AO 

IT  ( (RRA0P.L1 • TN(  Y(  T)  ) .AMP.  (RRAOD. GT . 0.) ) 

♦ CC  10  60 

IF  ( (Ft.  A!lO.  GT  .T<!  Y (A)).  ANO.  (RRARP.LT.  0.)  ) 

* GO  TO  60 
RETURN 

?0  PR/Ot-TKEY(l) 

IF  (KEY(-).n.O)  frTUR>.’ 
f RIHT  (1,220  ROUT! 'If  , INCUT 

ET1  IK  N >•  ....  . 

AO  RRA Or  = 1 Y E Y ( P) 

IF  ( Vi  Y (C  ) . Fr' . 0)  f.f  TURN 


PR ) NT  ( 1 , C ) ROUTINE, LNCNT 
RtTUF.N 

60  RRA 00  = A, D 

IF  ( K F Y { t ) , F 0 , 0 ) (••TURN 
PRINT  (1,260)  ROUTINF, ( HCNT 
Rfltir  N 

60  TRAOr-O.O 

IF  (VEY(U).E'.Q.O)  EFTtlRN 
PRINT  (l,?o',>  ROUT  I '4  F , L NCNT 
RETURN 

??0  FOkMAT  ( lMn,"-F':Arti  °0E 1 T I VF.  OVF~rLOW 


l INF  - • 
l INF  - 1 
LINS  = 
LINE  = 


220  FOrRAT  ( 1H3 , ,,-l<;Art'  ^ORUIVF.  OVF~riOW  " , A A . •*  UKif  ■=  •• 

2<  0 FORMAT  (lHI,"TRACn  *1  - G T I Vf  OVF'’i‘L  OW  ’*,  A ? , " l INF  - 

2A  0 F Of  MA  T OKIC’TdO  °0S"T1VF  UNI'  “TlOW  LINE  = ' 

?f  0 FORMAT  ( 1 HT , "^RA^O  NEGATIVE  UNOFRFLON  ",AP,"  LINE  = ' 

LNO 

RFAL  FUNCTION  RlAOPtll  , R?  , ROUT  IMF  , LNC  NT , IFNL  ,K'FY,TKEY> 
01  HE  LSI OS)  KEYd) 

DIMENSION  TKEYC.) 

RlAOr=31*R2 

IF  (Kf Y(3).L0.1)  CALL  ROUNDER ( R 1 A 00,  I FNL , K EY) 


IF  ( ( K t Y { Jj.EO.O  .ANO.  (KLY(4)  .E'l.2>)  CALL 
IF  ( (KEY< 3) .FT. D .ANO.  (Kf Y(A> .FO.l) ) CALL 
IF  (f  UOO.GT.TKEY  (1)  ) CO  TO  20 
IF  (MA0O.LT.TKEY  (2)  ) CO  TO  NO 
IF  ( (K1AP0.LT .IKE  Y( T> ) . ANO. (R1A0D.GT. 0.) ) 

* GC  TO  60 

IF  ( (P1AO0.GT.TKE Y(. )) .AND. (E1AD0.LT, 0. I ) 

* GC  TO  6? 

EFT  URN 

?0  K1A0RST KEY ( 1) 

IF  (KFYC  ) .( 0.0)  RETURN 
PRINT  (1,  220)  R.3UT I NF , LNCNT 
RE1UFN 

AC  RlA0r=TKEY(2) 

IF  ( K f Y ( > ) ,rO.O)  RETURN 
PRINT  (1,2^01  ROUTINE, LNCNT 
finUMJ 

60  RIADO^O.: 

IF  (K£Y(:> ) .f  0.  0)  FFTUPH 
FRINT  (1,200  ROUTINE, LNCNT 
RETUF  N 
60  F1ADC=C.0 

IF  (KEYC,  I.FC.O)  PFTURII 
PRINT  (1,230  ROUTIN'",  LNCNT 
RETUFN 

220  FC»MA  T ( 1 Ml  ,"R  1 TOO  R0E"T1V[‘  OVC^FIOW  ”,f 
?l  C F 0 1 EAT  UH.|,"UT.  1 N?  G * T I Vi  07f"'rl  VV  ",  A 

rto  rof  mat  (i'i:,"»noo  vn’.  c uno-tlow 

260  F CANA  T ( 1 HO,  'XI  ADD  NEGATIVE  UNOEvE  LOW  •• , 
ENO 


CALL  T WOT f NO ( R1 A CD, IF  NL ,KEY) 
CALL  CNET  :NC(Ri  ADD,  IFNL,  KE.Y) 


l INF  = ",I5> 
LINE  = ”,I,.>') 
LINE  t 

l 1 NE  r 15  ) 


PEAL  FUNCTION  R2  A HO  ( Ri  , I ? , ROUTINr , LNC  NT , I FNL  ,K( Y.TKEY) 

OIRDMON  KEY(R) 

DIPC*  ; ION  T KE  Y ( A) 

R2AO0*m«I2 

IF  (FEY<3).( O.l)  CALI  FOUNDER ( HO,  I FNL  , K FY ) 

IF  C (KEY  (3)  .fcN.O)  .AND.  (KFY  (>t ) .C".?))  CALL  T WOT  MIC  t s?A  OD,  I FNL  , KEY) 
IF  ( (KCY(  3)  .ED.  0)  . AND.  (KEY('i)  .ED.  D > CALL  ON  Cl  1 NC  ( K2ADD,  IFNL  , KEY ) 
IF  (f  2ADD • CT . T KEY (1  ) ) CO  TO  20 
IF  (I  2AOD.L1  .KEY  (?)  ) 00  TO  AC 
IF  ( (F  PA 00 .LT.TKFY(I)  ) .AND.  (R2AUP.GT.  0. ) 1 

* GO  TO  GO 

IT  (<R?At)D.GT.TKCYC»)  ) .AND.  (R2ADD. l.T.  0. ) ) 

* GO  TO  30 

peiufn 

2C  k2AOr--TKCY(l) 

IF  (KCY(G).EO.O)  RETURN 
PRINT  (3.220)  ROUTINE i LNCNT 

RE1URN  ...  ...  — - - .....  ... 

AO  » 2ADr.-TKEY  (2) 

IF  m YC  ). CO.  0)  RETURN 
PRINT  (1.2T0)  ROUTINE, LNCHT 
PEIUFN 

GO  R2/.DCF  c.O 

IF  (KEYED .EO.O)  PETURN  ....  ...  . 

PRINT  (3,2^0)  ROUTINE, LNCNT 

RFTUFN  

to  K?AOr=C.O 

IF  (KEY(E).EO.O)  RETURN 
PRINT  (3,200)  ROUTINE, LNCNT 
PE  TURN 

220  T Of  MAT  (1HO,M:,.?ARO  °OSTTIVE  OVEOEI  ON  ",AS,"  LINE  = ",I5) 

2<  r PORI' A T (3  NT,  'X2A  (0  NEGATIVE  OVERFLOW  "tHt"  LINE  = ",1C> 

2(0  FOR", AT  { 1H0  ,*‘T2  ADD  “OSITIVE  UNDERFLOW  ”,A9,"  LINE  = *',I5) 

2(0  fCIHAl  ( 3 HO , "T2AD0  NEGATIVE  UNDERFLOW  ",A3,"  LINE  = **,  ItJ ) 

f NO 

REAL  FUNCTION  RR"NS ( R1 , R2, ROUTINE , LNCNT, I FNL , KEY, TKFY) 

01  HI  NS)  ON  KFY  (9  > 

DIKE.  l-S  I ON  TKEY(A) 

RRHNf-  = Rl-R2 

IF  <KtY(3).E0.1>  CALL  "OUNOER ( RnMNS,  I FNL  , '<  EY) 

IF  (<KFY(3).ED.I)  .AND.  (KEY(L ),En.?))  CALL  1 HOT  F NC ( RRMMS, I FNL,  KEY) 
IF  ((K(Y(3).ET.O)  .AND.  < KEY(  *>)  . En. 1 ) ) CALL  CN  F.TRNC  ( RRMMS  , I FNL  , KEY  ) 
IF  (FRHHS.GT.TKEY (1) ) CO  TC  20 
IF  <F  RI'NS.LT.TKEY  (2)  ) CO  TO  liO 
IF  ( (Rl  MNS.LT  .TKEY(  3)  ) . AND.  (Rf<MNS.GT.  0.)  ) 

* GO  TO  GO 

IF  ( (RRMMS.GT.TKCYU)  ) .AND.  (RRMNS.LT.  0.)  ) 

* GO  TO  81 
RETURN 

20  RRHNi =TKE Y ( 1) 

IF  (Kf  Y(-  ).E').0)  RETURN 
PRINT  (1,220)  ROUTINE, LNCNT 
RE  TUI  N - 

(it  RRKNS  =T  Kr Y (2) 

IF  ( K t Y ( 1 ) . F 0 • 0 1 r.ET'tH! 

PRINT  (1.2A0)  ROUTINE, LNCNT 

PETUFN  - . 

tO  RR)‘NS  = C.O 

IF  (Kt  Y(!  ) .CO.C)  PETUPEI 
PRINT  (l,2ci)  ROUTINE, LNCNT 
RETURN 
tO  Rfi HNS - P . 0 

IF  (FEYO).EG.O)  RETURN 
PRINT  (1,230)  ROUTINE, LNCNT 
RETURN 

220  FORUM  ( IH3,— '.PINS  °DS'T1VC  OVETl.DW  **,  A?  l INF  = MOI 

2<  0 FCC  AT  (1  M’,"  VMNS  NEGATIVE  OVEr(ON  ",ff  LINE  = ~,I?) 

2»  F re.  "AT  (1H","{M’'S  3.DEMIV:  UNO  FLOW  LINE  = ",  ID) 

2A0  F 0 * "AT  (IM  J.’XF.  INS  NEGATIVE  UNDERFLOW  LINE  = 'M1.'.) 

(NO 
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1 


. 1 


M / l ( UNCTION  HI  VMS  ( 1 1 , F?,l  OUT  IW  ,t  NCNT  , ) FNL  , K ! Y , 1 K l V » 

I’’  HI  IMO’I  Kt  Y C 1 
PI  HI  I S J 011  T K L Y { ** ) 

M MU’  1 1-R? 

ir  imi.n.in.i)  r*u  roorniM!.tf,Kr.,  irm  ,<i  V) 

)(  ( ( *' r Y ( .11  , ( ) . 1 1 , A ‘I1’ . ( *.  I Y < A I . . 1 1 CM  I U’CT-tCMRIHNS.IFNL  ,K(V) 

)«  < ( Kt  y ( m . < n.  it  .an ' , u,i  Yi >, ) ,1  l) ) CHI  oi>>  1 u::  tku’us,  11  t.x » ki  y> 

if  (i  if  ws.r.T . tk' y ( 1 n ni  to  or 

ir  <1  hsn.i  t.iv  i \ r » t fo  to  * o 

IF  ( (I  IMNS.l  1 . IK'  Y(  1)  ) . AND.  (Nl  MNS.F.T  . 0 . T > 

* CO  TO  OO 

Jf  ( (F  IMMS.r.l  . TM  Y(i)  ) . AN).  (K1IINS.LT  . C .)  1 

* CC'  TO  0(1 
FfTUKlI 

?o  n mu  t :*  y:d 

IF  <‘.l  Y < ) .1  0.  O',  r • TURN 

mm  mi  u.r.n  1;  1 m : t , ( nchi 
N‘  Tl'l  N 

t r him:'  1‘  * v < :*» 

IF  ( M Y (>  > . f 0 . P ) FS  TURN 
mint  (l.P.OI  MtriNt  , l NCNT 
RfTlNN 

CD  K 1 MU  0.0 

IF  (M  Y(‘  ) . I 0,'  1 f > TUN  14 
(4)1  ■ ’ U..-NC)  /I  OUT  IN.  NCNT 

n hh  n 

ro  r 1 niu  (.0 

jf  iMtf  i.tn.o)  ''•Turn 

f’fjnt  u.pooi  «0'.niN'  , incut 

rcuu  n 


pro 

( 01  1 

"1  T 

< 1 m a 1 ■■  >i  r; 

Rps*  1 I VI 

OVS  ••FLOW  ", 

1 INC  - " 

, JF.) 

?l  0 

1 0!  Ill  T 

mi 

N*  (•■  T XVI 

ovi.inoo 

i INI  « " 

,19) 

pi  ? 

rc; 

i;  1 

(|lOi"ll'i  s 

'P‘1  ’ 1 T VI 

Ullll’  •' F L IK’  • 

t 1 HC  • 

•MID 

pi  0 

F PI  ”1.  1 

(imviinaS 

II!  l.ATIVf 

UNO’.Af  LOW  ' 

’ 1 A ” » " 

L1UL  “ 

M3) 

(NO 

M .H  FUNCTION  P IVHSICI  , I P,KOUT I NC, LNCNT , IFI.'L  ,KI  Y,TK!Y> 

PI  Ml  NS  1 ON  K T Y ( (I ) 

PI  MCI. SI  ON  TKIY('t) 

R2NNT  • H-I? 

JF  (R(YUt.lC.l)  CALI.  rOUN!ir  R ( R’MNS, 1 1 NL , < TY  ) 

JF  ( ( KFY  ( J)  . TO  . 1)  , A NO  , (K  t Y (•'.  ) • F?. ?) ) CAl.l  1 HOT  'NO  ( UPIINS , I TNI  , KT  Y) 

IF  ( (K!  YU)  .' 0.  II  .AN”.  (N‘ Y('.>  .1  0.1  1)  CALL  ON*  H N C-  ( IIP?  NS , ] I NL  . Kt  Y > 

IF  (f  P*  Nji  ,6T  . T K'Y  ( 1 ) ) CO  TO  ?0 

IT  (I  Pnr.. IT. TKrV  (?)  > CO  TP  Ad 

If  ( (FI  NNS. I I .TNI  Y<  t)  ) . AN9.  (r-  ININS. CT  . 0.)  ) 

* CO  TP  CO 

JF  ( (F-riNS.CT  .TNI  Y ( A ) > .AND.  (K’NNS.LT  . 0.)  ) 

• CC  TO  00 
fvfUIFN 

?P  f ? I NS  > TKI.  Y ( 1 ) 

IF  (»  f YC  > .10.0)  f FTWON 
PRINT  (1,e?0)  R TUT INr , LNCNT 
RFllIMN 

AO  R?WU  - TKF  Y ( ?) 

JF  (H(Y«j>  .10.0)  PFTVIPN 
»RTNT  ( 1 , PA  0 ) POUTIMf , LNCNT 
PflUI.  N 

CO  P?MNS«=C.O 

JF  (M  YC-I.LO.O)  FTTIIRN 
PRINT  (l.PCO  KOUTINF, LNCNT 
Fc'HlFN 

fO  F PUNS'- 0.0 

IF  (FEYf',1  .(0.0)  FTTIIRN 
FRUIT  (l.PPO)  ROUTIN'.!  LNCNT 
P El  UK  N 


??P 

rn  in  1 

(IMP, 

“ (PM*  S 

OOSTTIVF 

pvron  00 

LTVF  «■ 

M5) 

?'  : 

» c:  ‘7  t 

nui, 

"<?Tir> 

Nf  r.  • T l V! 

OV  I'l  1 ,10 

l I Nl  = 

".I*  ) 

pi  r 

FC  N/  T 

(i**t, 

? p ■ 1 1 

•’or  ’ T T Vr 

UN);  A f L 10 

••  t l'  •• 

t " * 

l 1 I’l  1 

??r 

f cm.;  1 

(1)0, 

" <PMNN 

Nl  C.l  1 I VI 

(in o-  n low 

••  •• 

L 1 NL  r 

",  !'•’  > 

INP 

1 ? 1 


f f M.  f UNCTION  R'V'PY  ( K1 , R?  ,1'OUT  I "li.  ,LNCNT,  I*  Rl  ,KFY , TKEY) 

rim'KSTon  k t y ( *■•  > 

O I ft  I S 1 ON  T Kl  Y C *» » 

P<5Hf'Y  = Kl  R2 

IF  (KFY{?).f0.1)  CALL  * OUNDCMKOfTY,  lFNL,<rY) 

ir  ((KFYf.Yf.n.a)  . vm.  (ki  y(j.)  .n.D)  call  i hoi  r nc  i rpkpy  , i tnl ,key) 

ir  UM  y ( :\>  .n.  » . vn.  <>;t  Yd.) n>  call  cni  trnc ( krmpy , if  nl  » k eyj 

JF  (F  f FRY  «C*T  . 1 K ”Y  ( 1 ) ) ro  TO  20 

If  (f  kfVY  .IT.Tk  :y  CD  ) r-0  to  i.0 

If  < (RNMPY.LT.TKt  Y(  i)  > . A NO.  <RRM'’Y,GT..  0.)  ) 

* CO  10  CO 

IF  ( (Rff;t>Y.r.T.TNFY  Ci)  ) .ANO.  (RRMi’Y.LT.  0.)  ) 

* co  io  on 

RETURN 

?C  KRMPY- I KL  Y ( 1 ) 

IF  <*  l YF  ) ,FO.O>  P.miRf) 

PRINT  (1.220)  K0UT1NF ,LNCN1 

RETURN  

AO  K K FRY*,  1 N*  Y ( 2) 

IF  (KEY!  1)  ,F1.  0>  RETURN 
PRINT  <l,2-',n)  ROUT  IN!.,  LNCNT 
RETURN 

CO  RRNPYtO.O 

IF  UIYO.FO.DI  PFTURI) 

PRINT  (L.2G0)  ROUTINE  ,1  HCMT 
PFTURN 
FC  RRMPY - 0 . P 

IF  (KFY(li).FO.O)  RCTURN 
RR1NT  ( 1 ■ ?o  J ) ROUTINE, LNCNT 
Rf 1 URN 


220 

F Of  hf  1 

( IMP, •'■FRNPY 

P(15TT1  Vi 

ovr  :n  nw 

M 

1 1 nr  = ",15) 

?!  G 

roi  HUT 

( lM3,"<”>|f  Y 

NT G.’  T ’ Vi 

ovc.rLow 

",M," 

LINE  « " ,15) 

2f  0 

1 0‘  ft  T 

( 1 HO,  '"(K  (rY 

”05 T 1 1 vr 

UNO  -1  l OH 

LINE  ",  15) 

?f.O 

1 ( i KM 

( 1 HO  ,"R(\'T,JY 

NC  GM  I v: 

UNO**  P.  FLOW 

LINE  = ",I0) 

ENO 

rFM  FUNCTION  R1MPY ( 1 1 , R2 , ROUT  INF, LNCNT, IF NL  >KCY , TKFYJ 
rif.rfSION  K F Y ( 0 ) 
ri r.rt  eion  tkiyk) 

PIliPYtll  R2 

IF  (KEYCM.FO.l)  CALL  f OUNPCR  (RIHF'Y,  I FNL  , < F Y ) 

IF  (IKtY(T).EfT.  ))  .AND.  (Kt  YK)  .10.2))  CALI  7 WOTf-  NC  < K1  MnY,  IFNL , KEY) 

IF  ( < KIY(3)  .FT.  0)  . AND.  <KrY<(; ) .FP.  1)  ) CALL  CNEKNC  (R1KPY,  IFNL,KEY> 

IF  <r  1PPY  .r.T.TKFY  m ) CO  TO  20 

IF  (f  1 FRY  ,LT  • TK  .'Y  (?)  ) CO  TO  AO 

IF  ( <t  1MRY.LT  .KEY  UII.ANO.  (R 1MPY  «GT  . 0 . ) ) 

* GO  TO  cn 

I r (<MNPY.GT.TKl  Y (•',)>  . A NO.  (R1HPY.LT  • 0.)  > 

« GO  TO  00 
RETURN 

?f.  P 1 RPY-TK' Y ( 1 ) 

IF  (KEY(  I.FO.O)  nr TURN 
PRINT  (1, 220)  ROUTINE, LNCNT 

PEUirN  ... 

AO  RlhPY- TK1  Y (?) 

IF  t K f Y ( : ) .EO.O)  CTTIIRM 
PRINT  <1,3.0  ROUTINE, LNCNT 
P f Ulf  N 

CP  Plf'PYrO.O 

IT  ( F E Y t I.FO.D)  RFTIIPN 
fPJNT  (l,?t<0)  ROUTINE,  LNCNT 
RETURN 
fO  p 1 )>P  Y - C . 0 

IF  <KEY(!  J.E.Q.O)  RF1UPN 
FR1NT  (1,280)  R0U1INC, LNCNT 

RETURN  . 


2?r 

rpi  mat 

(IN’, 

"?1MPY 

positive* 

ovrrri  ow 

",  AS 

LINE  * ",Tr'l 

pi  r 

rc  ‘ / i 

(IN.), 

"RlT'Y 

•b  0 ' 1 I VI 

uV"  i l )H 

",f  ■ 

l INF  v " , 1’  ) 

?'  r 

ret  h/  t 

(•*'., 

” < 1 HI  V 

'0.'  • T 1 Vi 

UM  ) FI  >W 

••.A," 

l)NC  ^ *’,15) 

Pi  3 

1 Cl  11/  T 

(IHj, 

"■<15,  Y 

<rc;  l ivi 

UNO 1 <1  LOW 

" , A " 

LINE  o ",1C>> 

LUO 
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real  ru-(ci  ion  R T'ti’Y (Ri , ir,nouTiiif  jLHCNT*  i i *n  ,R!  y,  iki  y> 
rirtKSIOS  K I Y ( s » 

DIMI'SIOV  T KEY  ( ’») 

F’FI'Y:  f II? 

if  (HYui.ro.n  cou  rou»:i»FP(i*?«fY,iFNt  ,Knr> 

IF  ( Off  Y (3»  .11.  I)  .AND.  (M  Y < *.  > .(  0.?))  CAM  TUPT.Nl  <P.2M'’Y,IFNL,KFY) 

IF  ( (KEY(3) .FT. 1>  .AN  >.  (K  Y(I.).L'MI)  CALL  0MtT'.NC(  L?K1'Y,1I  NL,K»  Y> 

IF  (f  2*!I>Y  .01  ,TK  Y (1  ) ) FO  TO  20 

IT  IT  2”f>V  .1  T.TK  Y (?) > f 0 10  (.11 

ir  ( (RFMPY.LT.  T N 1 Y ( 3 ) ) .A  NO.  (N2M?Y.r,T  . 0.)  ) 

* CO  TO  63 

IT  ((R2MPY.G1  ,TKt  YU,)  ) , ANO.  (KZMPY.LT  . 0.)  ) 

* CO  TO  PO 
RF7URN 

TO  K2I  f'Y  - TKf  Y ( 1 ) 

IF  (FfYC.  ) .ro.0)  KF TURN 
FR1UT  (1.220)  ROUTINE,! NCNT 
PF1UFN 

tiT  R?hPY,=  TK':Y  ( ?) 

IF  ( R t Y ( ).n.O)  RFTURH 
FK’INl  (1,2/iC)  K1UT  3 (C , L NCNT 
RETURN 

6f  RZFPY-d.O 

ir  (Kt  YU  ) ,(  G.C)  RFTliPd 
ri?INl  (l,?f.O)  KOI*  T 2 NC,  L NCNT 
Ft  TURN 
60  R2HPY® 0.0 

IF  (FFYO  ) .f  O.C)  RETURN 
FRINT  ( 1 , 20  (I ) ROUTINE,  INCUT 
RtlUFN 

22P  FCI.MAT  (1M!1,”'??,1',Y  POR'llVF  OV'.flOW  ",  M>  , " LINC  c M,IF) 

?(0  FOINAT  ( 1 M1,"1,’TT  NFG.'TIV)  OVK’FIOW  ".CP,”  IINF  = ”«I') 

?(  u FORMAT  (lH?f”T?HPY  f,OS*7IV:  UNO!  "M  ON  ”,AAt”  1.1  Nf  = ",  TO) 

?pp  format  (iua,"-i?:irY  Ncr.nivi  und'-WLOh  ",»?,••  uni  = ”,15) 

two 

REAL  FUNCTION  KROVOtRl , f,2  (ROUTINE  , l Nf- NT  , 1FNL  , Kt  Y , TKEY) 
niF«n  sion  key  ( o t 
DIMENSION  7 Kt  Y ( A ) 

FSnVPa«l/R? 

IF  ( F f Y ( 3 ) . F 0 • 1 ) CALL  ROllNp*nR ( R "r)V0,  T FNL  , < FY ) 

IF  ( (KF  Y(  3)  . 1 1.  1)  . ANO.  (Kl  Y(l. ) ,i  O.D  ) CAl.l  1 NOfNC  (KROVP,  IRNL  , KEY) 
IF  ( (KtYI  3)  .n.  0)  .AN".  (KLY(i.>  .1  0,1 ))  CALL  CNlTr NC < PRO VO, I FHl , KEY) 
IF  (PROVO, GT  .KEY  (1 ) > F-0  10  ?0 
IF  (FNDVO.LT. TKFY <2)1  FO  TO  NO 
IF  ( (M  OVO.L1 ,1K>  Y( 1) 1 .AND. (KROVO.CI . 0.1 ) 

* f.r  TO  60 

IF  ( (NN  0V0.G1  . TKF  Y ( •» ) ) .AND.  (RKOVP.LT . 0 . > ) 

« GO  10  00 
RFTUFN 

20  F'-LVD  = TNFY<1> 

IF  (FI  YU  ) .to.  P)  RETURN 
FOIN1  (1,220  ROUTINT, INCUT 
RflUr.N 

( n rt'i!vt'  = iKrY(2) 

JF  ();IY(:'.I  .EO.  0)  PCTUPN' 

F~  I NT  (1,?(>0)  ROUTINE, INCNT 
RETURN 

(P  FSDVrsO.li 

IF  (Ft  YU.)  . CO. C)  RETURN  . . 

FT1NT  (1,260)  ROOT  INC , l.  NCNT 
RETURN 
pp  rwGVP=p.n 

IF  (K£Y(M .ro. 0)  RETURN 
FR I N1  (1.2S0)  ROUTINE,  (NCNT 
Rt  TURN 

??"  FOt  M(  T (1*10,  “VMVP  "OS’TlVr  OVI  TION  ",AA,"  IIK'F  = ”,I6) 

?.  0 rO‘“AT  <1M7,"U'0VP  N'F.’TIVF  OVi'"l  ON  IXH!  " ",1M 

?.  r F(.  “M  ( l*M,"UvOVl  ’OF  TIV«  UHOt-)  Iv.M  “ l i Ni  ; ",  1‘  > 
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P!  AL  FUNCTION  KIPVOU  1 ,R2, ROUT  IMF, LNCNT,  IFNL.KF  Y.TKCY) 
run  sio  i 
PIKE)  S 1 UN  1 Kt  Y < 

R1 PVP=1  1/R2 

T f (KfY(').fo.n  rm  rout!OEr:tkif'vn,irKi,<FY» 

IF  ( (Kl  Y (3)  .FT.  M .AT'.  (Kl  Y<»  > ,t  0.  ?>  > CALI  T wpT  l.UC  ( RIO  VO,  IFNL  » KE  Y) 
IF  ( ( KFY  ( 3)  . ‘ 1,  A)  , ANO  » (K!  Y (•',)•  i 0,  J > > CALI  ONETP.MC  (R1  OVD, IFNI , KfcY) 
IF  (HCVO.GI  .TREY  Cl  > ) <"0  TO  20 
if  (i  lrvn.LT.TK'Y  (") ) ru  to  i. o 
ir  < (FtOVO.LT.TKI  Y ( 3) ) . AND. (KIOYO.GT . 0 .) ) 

• CC  10  CO 

IF  < (RIDVO.GT.TKEYU)  > , ANl).  (K1UV0.LT . 0 .) ) 

» CC  TO  00 
F FT  urn 


20  F 1 PVP  -TK*  Y(l) 

IF  (F | Y ( . ) , £ 0 , t ) RETURN 
PRINT  (1,2??)  POUTIN': , LNCNT 

petufn 

*s  C FllAT:  TK!:V(?) 

IF  (KLY(‘.).fO 
PMNT  (1,21,0) 
f E1UFN 
60  RlOVCsO.C 

IF  (FLY  C. ) .f  0. 0)  RETURN 
PRINT  (1,260)  ROUTINE,  INC  NT 
RETUf  N 


0)  PrTUPN 
ROUTINE , LNCNT 


,.-7Uf  N 

ucvc-o.o 

IF  (KE  Y (:  ) • FO . 0 ) RETURN 
FRIMT  <1,?C0)  ROUTINE, LNCNT 
PETUFN 

rcriTT  (lHI,"110Vr)  POS^TIV'  OVCTLOM  ”,A?,“ 
rO'KH  (DOV’^lOVD  NFGATIVF  OVERFLOW  ".AS," 
FCTHM  (lHI.’^nv’O  °0?  *T  I VE  UNDr\F  LOW 
I Of  HAT  ( 1 HO , " <1  ’)V0  NEGATIVE  UNDERFLOW  ”,Af,‘ 
FNO 

PFAl  F UNO T TON  R2PVP (FI , 1 2, ROUTINE, LNCNT  , 1 FNl 
OIKEISION  KFY  ( 8 ) 


l INF  = M5) 

L 1 Nf  = V,  15) 
LINE  = TG ) 
LINE  = 


A20VrsRl/I? 

IF  (Kf  Vlil.fQ,]  > CALI  f OHI.’OERin'TVO,  I FNl  ,<rv) 

IF  ( (KtY(3)  . FT.  .))  .ANO.  (KEY(N  > ,Er>.?)>  CALL  T WOT -NC ( R20VD, IFNL , KEY) 
IK  ((KFY(3).FT.O)  .A  NO,  (KfY(A).E').l))  CALL  ONLT1  NC  (R2PVO, IFNL , KEY) 
IF  (F2CV0.ST  .TKT.Y  (1)  ) Co  TO  20 
IF  ((  2ftVO.LT.TKFY  (?)  ) CO  TO  1,  ? 

IF  ( (K20V0.LT .T  KEY ( I) ) . ANO. (K20VP.GT . 0.)  ) 

* CO  TO  60 

IF  f (R?DV0.GT.TK£  Y ( 4 ) ) .AML).  (K20V0.LT  .0.)  > 

* CO  TO  CO 
FETUf  N 

2C  F2PVC=TKEY(1) 

IF  (KEY(6)  .FO.O)  RETURN 
F MINT  (1,220)  RIUT1NF, LNCNT 

KFTUfN  - - ~ 

l,r  p?tvr  = TKFY(2> 

IF  IKIYP  ) .FO.O)  RETURN 
PF1PT  (1,2,0)  ROUTINE, INCUT 
PE1UFN 
Ct  K2l;vr~0.0 

IF  (KEY  IV ) ,EO,C)  RETURN 
FEINT  (1,200  ROUTINE, LNCNT 
RETURN 

e?  R?ovr=o.o 

IF  (KEYC  ) .EO.tj)  RETURN 
PRINT  ( 1 , ?A  n>  ROUTINE, LNCNT 
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Pr  A l E UNOTtON  I'Nf  XP  (PI  ,R2, ROUTINE, INC  NT,  rrM  ,R|  Y ,TKCY) 

PIKE.  NS  ION  Kl'Y  ( P) 
riK>  |,M0:(  T X L Y ( •* ) 
rke  xi’-f<i-  »rc 

IF  IFF Y m .10. 1)  CALI  f'OUIItl FR  (RrTXn,IFNl,KEY) 

IF  « (KEY(3)  .M.EI)  .Ain,  (KEYHi)  .M.O  > CAI  l T WOTrRC < RFEX P, IT Nl , KEY) 

IF  <<KfY«M,F').*»  .A’JO.  (KEYN  > ,r0.5>>  FALL  CNFTI  HC  ( PKf  XP,  IFf.L  » KF  Y) 

IF  CF  TFXP  .GT.  T KEV  (1}  ) <-.0  TO  CO 

IF  <mxr>,t  T.  TR-.Y  (C)  ) r.o  TO  AO 

JF  ( (FPEXP.ET  ,KFY|  (>  ) . AMO,  (RRCXP.GT  . 0 .)  ) - - 

* (••0  TO  OO 

IF  ( (RF.i  XI'. 01  . 1 Nt  Y ( A,  > ) .A  NO.  (RRE  XP. LT  . 0 . ) ) 

* CO  TO  80 
RETURN 


?o  rRrxrrTxrvd) 

IF  ( KE  Y C ) • FO(  0)  HTTIIRK 
PRINT  (l.CCO)  ROUTINE, LNCNT 

RETURN  ... 

1*0  rREXF.-TKf Y<2) 

IF  m Y(-) ,FO.O>  RETURN 
PRINT  (l.CtO)  ROUT INF, LNCNT 
RETURN 

*0  KRFXPtO.O 

IF  <KEY(!>.EO.P>  FCTURN 
FR INI  ( 1 i Co  0 ) ROUTINE, LNCNT 
RETURN 

np  irexp-p.o 

IF  (KCYC  ) .1.0,0)  RETURN 
FEINT  (dCCd  ROUTINE, LNCNT 
RETURN 
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END 
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RETURN 
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M.AL  FUNCTION  ROT X"(R1 ,12,1  OUT  I Nf  , LHC  NT  , 1 F NL  , Kl  Y , TKL'Y) 
PI  MU'S  JON  *Ct  Y t rt  > 
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' CO  TO  SO 

return 

R?‘.  XI  -IKCY(I) 

IF  (Kf  YC  ) .1,0.0)  K!  TURN 
FF  7 ITT  (1,?L'P)  ROUT  1 IIC  j t NCNT 
RETURN 

f 2 1 X f TKEY(?) 

JF  Oh(:  ) .10 
rSINT  (1.2A0) 

RETURN 
R21  Xf  t (i.O 
jr  (Ft y(  > .fo 
PRINT  ( 1 i 260) 

RFlUr.N 
RJIXF-O.O 

if  (fey(m.fo.o)  p.rTUPii 

FR IK'T  (i.-'SD  ROUII'C,  I NCNV 
RETURN 

( iHO,"??rxr 
( 1 HO , "\?CXP 
dMc,"  i?:xp 
UHa,";L'EXP 


0)  RETURN 
ROUTIN'", l NCNT 


n)  RETURN 
ROUTINE, LNCNT 
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CALI 
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Q)  RETURN 
ROUTINE , l. NCNT 


IF  ( (KEY!  3)  .1  1.  1)  .A  NO,  IK  EY  (..).'  0.  D ) 

IF  ( (KEY  (3)  . n.  U .A  NO.  (KiY(t),E0.1>) 

IF  (f AEGN.G1 .TK  Y (1 ) ) CO  TO  20 
IF  (rASGM«LT,TKrv (?) ) FO  TO  Ml 
IF  ( (RASGN.LT.TN!  Y( 3) ) .ANO.  (KASCN.GT 

> GC  10  ( 0 

IF  ( (RASGN.r.T  .1  KI  Y (•♦)  ) . AND.  (KASCNaLT  . 0 .)  ) 

> CP  TO  SO 
RETURN 
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RETURN 
RASf.R-  C . 0 

IF  (U  YC  I .FO.  0)  FETUP.N 
PRINT  (1,260)  ROUTINE, l NCNT 
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MSGK-n.3 

IF  (KEY (f ' . FT. 0)  RETURN 
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RETURN 
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Appendix  C 

Mathematical  Function  Approximations 


In  chapter  A,  routines  were  presented  which  were  "optimal"  for  the 
AN/AYK-15A  flight  computer  in  the  sense  that  they  provided  the  required 
accuracy  specified  by  reference  57  while  requiring  as  little  execution 
time  as  possible.  The  routines  presented  in  this  appendix  were  also 
analyzed  and  were  deemed  to  be  inferior  to  those  presented  in  chapter  A 
for  the  AN/AYK-15A  digital  processor.  For  other  applications,  however, 
the  routines  presented  in  chapter  A may  not  be  satisfactory.  Routines 
for  approximating  the  square  root,  sine,  and  cosine  functions  are  pre- 
sented in  this  appendix,  along  with  some  additional  references. 

Sine  Approximat ions 

For  the  sine  function,  Taylor  series,  minimax,  polynomial  fraction, 
continued  fraction,  and  Chebyshev  expansion  approximations  were  tested. 
Each  proposed  solution  was  executed  using  the  n-bit  simulator  with  a 23- 
bit  mantissa  (single  precision  for  the  AN/AYK-15A)  being  specified. 

Where  applicable,  Horner's  Rule  for  nested  multiplications  was  applied 
in  evaluating  polynomials,  even  though  solutions  are  not  shown  as  such. 

Sine  Solution  JL.  This  solution  is  a truncated  Taylor  series  approx- 
imation of  degree  7 and  is  computed  by 

sin(X)  = AX  + BX3  + CX5  + DX7  XeJ\),r/2j  (C-l) 

with  the  coefficients  being  shown  in  column  2 of  Table  C-l,  which  fol- 
lows the  sine  solutions.  This  approximation  was  not  evaluated  as  a pos- 
sible candidate  for  use  in  the  F-16  software,  but  rather  because  it 
appears  in  software  being  tested  for  use  in  the  A-10  aircraft. 

Sine  Solution  2.  This  solution  is  a truncated  Taylor  scries 
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appi  oxlra.il  i on  of  degree  II  which  was  reduced  1 o degree  0 by  power  series 
economi  zat  ion  (Kof  72:7S).  This  approximation  is  computed  by 

sin(liX/2)  - AX  f J'X1  -•  CX'  + PX  + KX°  Xi  ^0 , ij  (C-2) 

with  the  coefficients  being  shown  in  column  3 of  Tabic  O 1 . This  ap- 
proximation was  discarded  because  l>  it  did  not  minimise  cither  the 
maximum  relative  or  absolute  error,  ami  ?)  it  required  more  multiplica 
Cions  and  additions  than  we*  c required  t o meet  t ho  accuracy  specifics— 
t ions. 

Sine  Solnt  ion  3.  This  solut  ion  is  a t runeated  Tavlor  series  ap- 
proximation of  degree  13  which  was  reduced  to  degree  7 bv  three  power 
series  economi  7.at  ions.  This  approximat  ion  is  computed  bv 

3 

si»(ii\/:l)  - AX  -f  Ux'  + t:xS  + px'  Xcjo.lj  (C-3) 

with  the  coeft  icients  being  shown  in  column  A of  Table  1 . This  ap- 
proximat ion  was  discarded  because  the  maximum  absolute  and  relative 
error  hounds  were  greater  than  those  of  the  minimax  approximat ion  pre- 
sented in  chapter  A. 

Sine  Solution  A.  This  solution  (Kef  7.1:82)  is  a rational  approxi- 


mation of  the  form 





Since  this  solution  was  obtained  from  a truncated  fifth-order  Taylor 


L 


series,  the  approximate  truncation  error  e is 


0.0046815  X 


,7 


(C-5) 


1 + 0. 12337  X 


This  approximation  was  evaluated  to  be  used  in  comparing  results  ob- 
tained from  continued  fraction  representations.  Since  a fifth-order 
Taylor  scries  approximation  provided  better  accuracy  than  this  rational 
fraction,  and  this  rational  fraction  provided  better  accuracy  than  its 
representation  as  a continued  fraction,  further  testing  of  rational  and 
continued  fraction  approximations  was  not  conducted. 

Sine  Solution  j>.  This  solution  is  the  continued  fraction  represen- 
tation of  the  rational  fraction  presented  as  solution  4.  It  was  assumed 
that  a floating-point  division  for  the  AN/AYK-15A  would  take  approxi- 
mately 3/2  as  long  as  a floating-point  multiplication,  so  there  would 
be  no  saving  in  execution  time  using  a continued  fraction  over  a Taylor 
series  polynomial.  This  solution  (Ref  72:84)  is  computed  as 

sin (ttX/2)  = A/X  - B \ XrJ^O.lj  (C.-6) 


where 


A = - 7ir/6 


20 


2f  (D 

(I 


The  values  for  these  coefficients  are  shown  in  column  6 of  Table  C-l. 
This  solution  had  larger  absolute  and  relative  errors  than  either  a 
fifth-order  Taylor  scries  or  the  corresponding  rational  approximation 
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presented  as  solution 


Sine  Solul ion  fe.  This  approximation  is  an  expansion  in  Chebyshev 
polynomials  up  to  degree  9,  resulting  in  an  expression  of  odd  powers  of 
X up  to  degree  9 (Kef  72:71).  This  method  should  be  distinguished  from 
Chebyshev  approximation  which  produce  minimax  polynomials.  This  approx- 
imation using  Chebyshev  polynomials  provides  a nenr-minimax  solution 
for  the  absolute  error  and  is  computed  as  follows: 

si n(7tX/2)  = AX  + BX3  + CXS  + DX7  -I  EX9  Xr[°,l]  (C-7) 

The  values  of  the  coefficients  are  shown  in  column  7 of  Table  C-l.  This 
approximation  was  not  recommended  for  use  with  the  AN/AYK-1SA  for  two 
reasons:  1)  it  did  not  attempt  to  minimize  the  maximum  relative  error, 
and  2)  since  seventh-order  polynomials  existed  which  provided  sufficient 
accuracy,  the  ninth-order  approximation  with  the  associated  extra  addi- 
tion and  multiplication  was  deemed  to  require  too  much  execution  time. 

Sine  Solut i on  7.  This  solution  is  a seventh-order  minimax  approx- 
imation which  minimizes  the  maximum  absolute  error,  as  opposed  to  the 
solution  presented  in  chapter  4 which  .minimizes  the  maximum  relative 
error.  Since  the  sine  function  wan  used  in  multiplications  and  divi- 
sions, the  polynomial  which  minimized  the  maximum  relative  error  was 
chosen.  In  other  circumstances,  this  routine  might  be  preferred.  This 
approximation  (Ref  71:117,202)  Is  computed  as 

sln(nX/2)  = AX  + BX3  + CX5  + DX7  Xi[o,lj  (C-8) 

where  the  values  of  the  coefficients  are  shown  in  column  8 of  Table  C-l. 

Sine  Solution  8.  This  solution  is  a ninth-order  minimax  approxi- 
mation whicli  minimizes  the  maximum  relative  error.  It  was  tested  to 
determine  whether  or  not  more  accuracy  could  be  obtained  without  using 


Siiu'  Solution  1 Sine  Solution  2 Sine  Solution  3 Sine  Solution  A 


p 


extended  precision.  This  solution  (Ref  71:117,204)  is  computed  ns 

sIn(irX/2)  = AX  + BX3  + CX5  + PX7  + KX9  Xr^O.lj  (C-9) 

where  the  values  of  the  coefficients  are  shown  in  column  9 of  Table  C-l. 
Using  this  solution,  the  maximum  relative  error  was  less  than  that  pro- 
duced using  the  seventh-order  polynomial  presented  in  chapter  4. 

Machine  roundoff  error  becomes  more  apparent,  however,  since  the  result 
is  only  better  by  a factor  oi  approximately  6,  instead  of  giving  two 
more,  decimal  digits  of  accuracy  as  cited  by  Hart  (Ref  71:117). 

Cosine  Approximation 

Only  one  cosine  solution  was  tested  in  addition  to  that  presented 
in  chapter  4.  This  solution  (Ref  71:118,207)  is  an  eighth-order  mini- 
max approximation  which  minimizes  the  maximum  absolute  error  and  is 
computed  by 


cos (X)  = A + BX2  + CX*  + DX6  + EX8 


[o.TT/2] 


Xc|0,ir/2|  (C-10) 

The  values  of  the  coefficients  are  shown  in  Table  C-2.  This  routine 
was  not  chosen  because  tbe  relative  error  for  X near  tt/2  exceeded  10 
which  was  the  error  bound  specified. 


A 

+0.999999953464 

B 

-0.499999053455 

C 

+0.0416635846769 

D 

-0.0013853704264 

E 

+0.00002315393167 

Table  C-2.  Cosine  Coefficients 
Square  Root  Approximations 

The  square  root  approximations  differ  in  the  way  in  which  the 
initial  approximation  to  the  square  root  is  obtained.  The  methods 
presented  use  Newton  iterations  to  obtain  the  desired  accuracy.  Several 
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mini  max  polynomials  arc  given  by  Hart  (Kef  71:94-95)  for  use  in  obtain- 
ing an  initial  approximation.  Although  these*  approximations  were  not 
evaluated  in  this  investigation,  they  merit:  consideration  also. 


Square  Root  Solution  1.  The  initial  estimate  Y for  this  method 
(Ref  69:31)  it.  accomplished  using  decimal  scaling  and  a rational  func- 
tion evaluation. 


Y - 
o X+4 


0.1  < X < 10.0 


(C-ll) 


Since  the  AN/AYK.-15A  computer  is  a binary  machine,  this  solution  was  not 
implemented  in  a test  environment  using  the  n-bit  simulator. 

Square  Root  Solut ion  2.  This  method  (Ref  69:33)  uses  a Fade 
approximation  and  exponent  scaling  to  obtain  an  initial  estimate.  As 
in  the  method  proposed  as  square  root  solution  2 in  chapter  4 , this 
method  shifts  the  mantissa  to  obtain  an  even  exponent.  The  exponent  is 
then  divided  by  two,  and  the  mantissa  f is  used  * n the  Fade  approximation 


f . A _ .i  + „/f 


(C.- 12) 


where  | D | , the  relative  error  term,  is  loss  than  0.0014.  If  ft^O.25, 
0.5oj,  then  the  coefficients  are  computed  as  shown  in  column  2 of  Tab  1 * 
C-3,  and  if  fe£o.5,l.oj,  then  the  coefficients  are  computed  as  shown  ii 


column  3 of  Table  C-3.  This  method  requires  only  one  iteration  of 


Newton's  method  to  obtain  the  desired  accuracy  (10  ).  Since  more  core 

storage  and  logic  are  required  in  the  initial  Fade  approximation,  the 
method  presented  as  square  root  solution  2 in  chapter  4 was  preferred. 
Methods  such  as  this  one  which  require  only  one  Newton  iteration  to 
obtain  a desired  accvtracy  merit  further  study,  however. 
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a 


i- 


HU 


A 

11 


1. 792843 
1.  70/469 


0.  SO 


0. SO  < f < 


2.  5 354 6 3 
4.82945/ 


J.OO 


1.0/14/9 


2. 142858 


Table  0.-3.  l’.'uli'  (ioefflelents 

Squat  e I’oot  Solut  inn  3.  Tills  solution  (I’ei  5b: 42)  usos  two  Now  I on 
i lor.it  I ons  to  obtain  t ho  required  a o on  racy . Tlio  Initial  approx  i mar  1 on 
Is  obtained  using 


0.154116  i 1.893872  X 
'o  1,0  I 1.047988  x 


1/16  v X < l 


(0-13 


vitb  tin-  initial  approxiinnt  ion  Y having  a relative  orror  loss  than 
0.0/S.  Tlio  o:  ror  a 1 1 o r two  Newton  lioratlons  is  approximately  tlio  same 
as  that  using  solution  2 prosontod  in  chapter  4.  With  this  solution, 
uni  o .1  foil  was  expended  to  obtain  tlio  initial  appt  ox  i tn.it  ion,  and  moro 
eoe I t i e louts  nrc  required  to  bo  ston'd,  so  tlio  mot  hod  prosontod  in 
i liapt  i i 4 was  proforrod. 

Squiro  Knot  Solution  4.  This  approximation  (Kef  70:2-3)  foi  (In' 
squ.no  root  (unction  uses  a relatively  poor  approximation  to  the  square 
loot  and  thou  rellos  on  a sufficient  numbor  of  Nowton  Iterations  to 
obtain  tin*  lequircd  accuracy.  This  method  has  one  advantage  in  that  no 
prc'itorcd  ci>o  t f i e I cut  s arc  required.  The  initial  approximation  Y^  Is 
obtained  for  the  square  roo*  of  X by 

Y0  - X/2  (014 

To  ( InJ  tiie  square  mot  of  3000,  however,  eight  Newt  on  iterations  are 
required  to  yield  a relative  error  loss  than  HI  *.  Therefore,  this 


t out  i no  e a s d i seardoil. 
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Errors  due  to  finite  wordlength  are  unavoidable  when  aircraft  signal  processing 
operations  such  an  flight  control,  navigation,  and  five  control  arc  implemented 
on  a digital  computer.  To  reduce  these  errors  to  tolerable  levels,  longer  word- 
lengths  can  sometimes  be  employed.  The  effects  of  some  of  the  errors,  such  as 
those  due  to  arithmetic  series  truncation,  machine  roundoff,  and  quant  last (on  of 
system  coefficients,  can  be  lessened  somewhat  hv  appvopiiato  numerical  analysis 
techniques.  , 
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20.  \n  n-blt  simulator  which  runs  on  Control  Pat  a Corporation  (CDC)  6600/CYHKR 
h\  computer  systems  was  modi f Led  and  then  used  to  evaluate  the  accuracy  of  a 
flight  navigation  routine  coded  in  FORTRAN.  The  routines  were  executed  without 
the  simulator  to  obtain  results  used  for  benchmarking.  The  n-bit  simulator  was 
employ'd  to  simulate  the  numerical  characteristics  of  the  AN/AYK-15A  digital 
pvocenaor.  Error  plots  were  constructed  which  show  the  maximum  errors  occurring 
within  small  plotting  Intervals  plotted  against  each  individual  input  value. 
These  plots  were  used  to  aid  visually  In  analyzing  the  error  characteristics  of 
the  avionics  routine  as  it  would  be  implemented  on  the  AN/AYK-15A. 

A critical  analysis  of  the  error  plots  obtained  showed  that  routines  which  are 
coded  using  single-precision  floating-point  arithmetic  are  prone  to  errors  which 
exceed  the  error  bounds  specified  for  the  routines.  This  occurs  even  though 
range  reductions  in  the  trigonometric  function  approximations  are  accomplished 
using  extended  precision. 
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